Computer Science
- [1] arXiv:2405.10346 [pdf, ps, html, other]
-
Title: AMCEN: An Attention Masking-based Contrastive Event Network for Two-stage Temporal Knowledge Graph ReasoningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Temporal knowledge graphs (TKGs) can effectively model the ever-evolving nature of real-world knowledge, and their completeness and enhancement can be achieved by reasoning new events from existing ones. However, reasoning accuracy is adversely impacted due to an imbalance between new and recurring events in the datasets. To achieve more accurate TKG reasoning, we propose an attention masking-based contrastive event network (AMCEN) with local-global temporal patterns for the two-stage prediction of future events. In the network, historical and non-historical attention mask vectors are designed to control the attention bias towards historical and non-historical entities, acting as the key to alleviating the imbalance. A local-global message-passing module is proposed to comprehensively consider and capture multi-hop structural dependencies and local-global temporal evolution for the in-depth exploration of latent impact factors of different event types. A contrastive event classifier is used to classify events more accurately by incorporating local-global temporal patterns into contrastive learning. Therefore, AMCEN refines the prediction scope with the results of the contrastive event classification, followed by utilizing attention masking-based decoders to finalize the specific outcomes. The results of our experiments on four benchmark datasets highlight the superiority of AMCEN. Especially, the considerable improvements in Hits@1 prove that AMCEN can make more precise predictions about future occurrences.
- [2] arXiv:2405.10347 [pdf, ps, html, other]
-
Title: Networking Systems for Video Anomaly Detection: A Tutorial and SurveyJing Liu, Yang Liu, Jieyu Lin, Jielin Li, Peng Sun, Bo Hu, Liang Song, Azzedine Boukerche, Victor C.M. LeungComments: Submitted to ACM Computing Surveys, under review,for more information and supplementary material, please see this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
The increasing prevalence of surveillance cameras in smart cities, coupled with the surge of online video applications, has heightened concerns regarding public security and privacy protection, which propelled automated Video Anomaly Detection (VAD) into a fundamental research task within the Artificial Intelligence (AI) community. With the advancements in deep learning and edge computing, VAD has made significant progress and advances synergized with emerging applications in smart cities and video internet, which has moved beyond the conventional research scope of algorithm engineering to deployable Networking Systems for VAD (NSVAD), a practical hotspot for intersection exploration in the AI, IoVT, and computing fields. In this article, we delineate the foundational assumptions, learning frameworks, and applicable scenarios of various deep learning-driven VAD routes, offering an exhaustive tutorial for novices in NSVAD. This article elucidates core concepts by reviewing recent advances and typical solutions, and aggregating available research resources (e.g., literatures, code, tools, and workshops) accessible at this https URL. Additionally, we showcase our latest NSVAD research in industrial IoT and smart cities, along with an end-cloud collaborative architecture for deployable NSVAD to further elucidate its potential scope of research and application. Lastly, this article projects future development trends and discusses how the integration of AI and computing technologies can address existing research challenges and promote open opportunities, serving as an insightful guide for prospective researchers and engineers.
- [3] arXiv:2405.10350 [pdf, ps, other]
-
Title: Monitizer: Automating Design and Evaluation of Neural Network MonitorsComments: accepted at CAV 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
The behavior of neural networks (NNs) on previously unseen types of data (out-of-distribution or OOD) is typically unpredictable. This can be dangerous if the network's output is used for decision-making in a safety-critical system. Hence, detecting that an input is OOD is crucial for the safe application of the NN. Verification approaches do not scale to practical NNs, making runtime monitoring more appealing for practical use. While various monitors have been suggested recently, their optimization for a given problem, as well as comparison with each other and reproduction of results, remain challenging. We present a tool for users and developers of NN monitors. It allows for (i) application of various types of monitors from the literature to a given input NN, (ii) optimization of the monitor's hyperparameters, and (iii) experimental evaluation and comparison to other approaches. Besides, it facilitates the development of new monitoring approaches. We demonstrate the tool's usability on several use cases of different types of users as well as on a case study comparing different approaches from recent literature.
- [4] arXiv:2405.10357 [pdf, ps, html, other]
-
Title: RGB Guided ToF Imaging System: A Survey of Deep Learning-based MethodsComments: To appear on International Journal of Computer Vision (IJCV)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Integrating an RGB camera into a ToF imaging system has become a significant technique for perceiving the real world. The RGB guided ToF imaging system is crucial to several applications, including face anti-spoofing, saliency detection, and trajectory prediction. Depending on the distance of the working range, the implementation schemes of the RGB guided ToF imaging systems are different. Specifically, ToF sensors with a uniform field of illumination, which can output dense depth but have low resolution, are typically used for close-range measurements. In contrast, LiDARs, which emit laser pulses and can only capture sparse depth, are usually employed for long-range detection. In the two cases, depth quality improvement for RGB guided ToF imaging corresponds to two sub-tasks: guided depth super-resolution and guided depth completion. In light of the recent significant boost to the field provided by deep learning, this paper comprehensively reviews the works related to RGB guided ToF imaging, including network structures, learning strategies, evaluation metrics, benchmark datasets, and objective functions. Besides, we present quantitative comparisons of state-of-the-art methods on widely used benchmark datasets. Finally, we discuss future trends and the challenges in real applications for further research.
- [5] arXiv:2405.10370 [pdf, ps, html, other]
-
Title: Grounded 3D-LLM with Referent TokensComments: PreprintSubjects: Computer Vision and Pattern Recognition (cs.CV)
Prior studies on 3D scene understanding have primarily developed specialized models for specific tasks or required task-specific fine-tuning. In this study, we propose Grounded 3D-LLM, which explores the potential of 3D large multi-modal models (3D LMMs) to consolidate various 3D vision tasks within a unified generative framework. The model uses scene referent tokens as special noun phrases to reference 3D scenes, enabling the handling of sequences that interleave 3D and textual data. It offers a natural approach for translating 3D vision tasks into language formats using task-specific instruction templates. To facilitate the use of referent tokens in subsequent language modeling, we have curated large-scale grounded language datasets that offer finer scene-text correspondence at the phrase level by bootstrapping existing object labels. Subsequently, we introduced Contrastive LAnguage-Scene Pre-training (CLASP) to effectively leverage this data, thereby integrating 3D vision with language models. Our comprehensive evaluation covers open-ended tasks like dense captioning and 3D QA, alongside close-ended tasks such as object detection and language grounding. Experiments across multiple 3D benchmarks reveal the leading performance and the broad applicability of Grounded 3D-LLM. Code and datasets will be released on the project page: this https URL.
- [6] arXiv:2405.10372 [pdf, ps, html, other]
-
Title: Efficient model predictive control for nonlinear systems modelled by deep neural networksComments: 8 pages, 5 figuresSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
This paper presents a model predictive control (MPC) for dynamic systems whose nonlinearity and uncertainty are modelled by deep neural networks (NNs), under input and state constraints. Since the NN output contains a high-order complex nonlinearity of the system state and control input, the MPC problem is nonlinear and challenging to solve for real-time control. This paper proposes two types of methods for solving the MPC problem: the mixed integer programming (MIP) method which produces an exact solution to the nonlinear MPC, and linear relaxation (LR) methods which generally give suboptimal solutions but are much computationally cheaper. Extensive numerical simulation for an inverted pendulum system modelled by ReLU NNs of various sizes is used to demonstrate and compare performance of the MIP and LR methods.
- [7] arXiv:2405.10373 [pdf, ps, html, other]
-
Title: A Transdisciplinary Approach to Cybersecurity: A Framework for Encouraging Transdisciplinary ThinkingSubjects: Cryptography and Security (cs.CR)
Classical cybersecurity is often perceived as a rigid science discipline filled with computer scientists and mathematicians. However, due to the rapid pace of technology development and integration, new criminal enterprises, new defense tactics, and the understanding of the human element, cybersecurity is quickly beginning to encompass more than just computers. Cybersecurity experts must broaden their perspectives beyond traditional disciplinary boundaries to provide the best protection possible. They must start to practice transdisciplinary cybersecurity. Taking influence from the Stakeholder Theory in business ethics, this paper presents a framework to encourage transdisciplinary thinking and assist experts in tackling the new challenges of the modern day. The framework uses the simple Think, Plan, Do approach to enable experts to develop their transdisciplinary thinking. The framework is intended to be used as an evaluation tool for existing cybersecurity practices or postures, as a development tool to engage with other disciplines to foster learning and create new methods, and as a guidance tool to encourage new ways of thinking about, perceiving, and executing cybersecurity practices. For each of those intended uses, a use case is presented as an example to showcase how the framework might be used. The ultimate goal of this paper is not the framework but transdisciplinary thinking. By using the tool presented here and developing their own transdisciplinary thinking, cybersecurity experts can be better prepared to face cybersecurity's unique and complex challenges.
- [8] arXiv:2405.10375 [pdf, ps, other]
-
Title: Implementing a GRU Neural Network for Flood Prediction in Ashland City, TennesseeComments: 13 pages, 3 figures, 3 tablesSubjects: Machine Learning (cs.LG)
Ashland City, Tennessee, located within the Lower Cumberland Sycamore watershed, is highly susceptible to flooding due to increased upstream water levels. This study aimed to develop a robust flood prediction model for the city, utilizing water level data at 30-minute intervals from ten USGS gauge stations within the watershed. A Gated Recurrent Unit (GRU) network, known for its ability to effectively process sequential time-series data, was used. The model was trained, validated, and tested using a year-long dataset (January 2021-January 2022), and its performance was evaluated using statistical metrics including Nash-Sutcliffe Efficiency (NSE), Root Mean Squared Error (RMSE), Percent Bias (PBIAS), Mean Absolute Error (MAE), and Coefficient of Determination (R^2). The results demonstrated a high level of accuracy, with the model explaining 98.2% of the variance in the data. Despite minor discrepancies between predicted and observed values, the GRU model proved to be an effective tool for flood prediction in Ashland City, with potential applications for enhancing disaster preparedness and response efforts in Ashland City.
- [9] arXiv:2405.10376 [pdf, ps, html, other]
-
Title: Dealing Doubt: Unveiling Threat Models in Gradient Inversion Attacks under Federated Learning, A Survey and TaxonomySubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Federated Learning (FL) has emerged as a leading paradigm for decentralized, privacy preserving machine learning training. However, recent research on gradient inversion attacks (GIAs) have shown that gradient updates in FL can leak information on private training samples. While existing surveys on GIAs have focused on the honest-but-curious server threat model, there is a dearth of research categorizing attacks under the realistic and far more privacy-infringing cases of malicious servers and clients. In this paper, we present a survey and novel taxonomy of GIAs that emphasize FL threat models, particularly that of malicious servers and clients. We first formally define GIAs and contrast conventional attacks with the malicious attacker. We then summarize existing honest-but-curious attack strategies, corresponding defenses, and evaluation metrics. Critically, we dive into attacks with malicious servers and clients to highlight how they break existing FL defenses, focusing specifically on reconstruction methods, target model architectures, target data, and evaluation metrics. Lastly, we discuss open problems and future research directions.
- [10] arXiv:2405.10377 [pdf, ps, html, other]
-
Title: Smart Routing with Precise Link Estimation: DSEE-Based Anypath Routing for Reliable Wireless NetworkingComments: ICMLCN 2024Subjects: Networking and Internet Architecture (cs.NI); Machine Learning (cs.LG)
In dynamic and resource-constrained environments, such as multi-hop wireless mesh networks, traditional routing protocols often falter by relying on predetermined paths that prove ineffective in unpredictable link conditions. Shortest Anypath routing offers a solution by adapting routing decisions based on real-time link conditions. However, the effectiveness of such routing is fundamentally dependent on the quality and reliability of the available links, and predicting these variables with certainty is challenging. This paper introduces a novel approach that leverages the Deterministic Sequencing of Exploration and Exploitation (DSEE), a multi-armed bandit algorithm, to address the need for accurate and real-time estimation of link delivery probabilities. This approach augments the reliability and resilience of the Shortest Anypath routing in the face of fluctuating link conditions. By coupling DSEE with Anypath routing, this algorithm continuously learns and ensures accurate delivery probability estimation and selects the most suitable way to efficiently route packets while maintaining a provable near-logarithmic regret bound. We also theoretically prove that our proposed scheme offers better regret scaling with respect to the network size than the previously proposed Thompson Sampling-based Opportunistic Routing (TSOR).
- [11] arXiv:2405.10378 [pdf, ps, html, other]
-
Title: A Polynomial-Time Approximation for Pairwise Fair $k$-Median ClusteringSubjects: Data Structures and Algorithms (cs.DS); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In this work, we study pairwise fair clustering with $\ell \ge 2$ groups, where for every cluster $C$ and every group $i \in [\ell]$, the number of points in $C$ from group $i$ must be at most $t$ times the number of points in $C$ from any other group $j \in [\ell]$, for a given integer $t$. To the best of our knowledge, only bi-criteria approximation and exponential-time algorithms follow for this problem from the prior work on fair clustering problems when $\ell > 2$. In our work, focusing on the $\ell > 2$ case, we design the first polynomial-time $(t^{\ell}\cdot \ell\cdot k)^{O(\ell)}$-approximation for this problem with $k$-median cost that does not violate the fairness constraints. We complement our algorithmic result by providing hardness of approximation results, which show that our problem even when $\ell=2$ is almost as hard as the popular uniform capacitated $k$-median, for which no polynomial-time algorithm with an approximation factor of $o(\log k)$ is known.
- [12] arXiv:2405.10385 [pdf, ps, html, other]
-
Title: AmazUtah_NLP at SemEval-2024 Task 9: A MultiChoice Question Answering System for Commonsense Defying ReasoningComments: Accepted at SemEval 2024 (Colocated with NAACL 2024)Journal-ref: Proceedings of the 18th International Workshop on Semantic Evaluation (SemEval-2024)Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
The SemEval 2024 BRAINTEASER task represents a pioneering venture in Natural Language Processing (NLP) by focusing on lateral thinking, a dimension of cognitive reasoning that is often overlooked in traditional linguistic analyses. This challenge comprises of Sentence Puzzle and Word Puzzle subtasks and aims to test language models' capacity for divergent thinking.
In this paper, we present our approach to the BRAINTEASER task. We employ a holistic strategy by leveraging cutting-edge pre-trained models in multiple choice architecture, and diversify the training data with Sentence and Word Puzzle datasets. To gain further improvement, we fine-tuned the model with synthetic humor/jokes dataset and the RiddleSense dataset which helped augmenting the model's lateral thinking abilities. Empirical results show that our approach achieve 92.5\% accuracy in Sentence Puzzle subtask and 80.2\% accuracy in Word Puzzle subtask. - [13] arXiv:2405.10389 [pdf, ps, html, other]
-
Title: Physics-Informed Heterogeneous Graph Neural Networks for DC Blocker PlacementHongwei Jin, Prasanna Balaprakash, Allen Zou, Pieter Ghysels, Aditi S. Krishnapriyan, Adam Mate, Arthur Barnes, Russell BentComments: Paper is accepted by PSCC 2024Subjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
The threat of geomagnetic disturbances (GMDs) to the reliable operation of the bulk energy system has spurred the development of effective strategies for mitigating their impacts. One such approach involves placing transformer neutral blocking devices, which interrupt the path of geomagnetically induced currents (GICs) to limit their impact. The high cost of these devices and the sparsity of transformers that experience high GICs during GMD events, however, calls for a sparse placement strategy that involves high computational cost. To address this challenge, we developed a physics-informed heterogeneous graph neural network (PIHGNN) for solving the graph-based dc-blocker placement problem. Our approach combines a heterogeneous graph neural network (HGNN) with a physics-informed neural network (PINN) to capture the diverse types of nodes and edges in ac/dc networks and incorporates the physical laws of the power grid. We train the PIHGNN model using a surrogate power flow model and validate it using case studies. Results demonstrate that PIHGNN can effectively and efficiently support the deployment of GIC dc-current blockers, ensuring the continued supply of electricity to meet societal demands. Our approach has the potential to contribute to the development of more reliable and resilient power grids capable of withstanding the growing threat that GMDs pose.
- [14] arXiv:2405.10390 [pdf, ps, other]
-
Title: Two-point stress approximation: A simple and robust finite volume method for linearized (poro-)mechanics and Stokes flowSubjects: Numerical Analysis (math.NA)
We construct a simple and robust finite volume discretization for linearized mechanics, Stokes and poromechanics, based only on co-located, cell-centered variables. The discretization has a minimal stencil, using only the two neighboring cells to a face to calculate numerical stresses and fluxes.
We fully justify the method theoretically in terms of stability and convergence, both of which are robust in terms of the material parameters. Numerical experiments support the theoretical results, and shed light on grid families not explicitly treated by the theoretical results. - [15] arXiv:2405.10391 [pdf, ps, html, other]
-
Title: Vision Transformers for End-to-End Vision-Based Quadrotor Obstacle AvoidanceComments: 8 pages, 10 figures, 3 tablesSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
We demonstrate the capabilities of an attention-based end-to-end approach for high-speed quadrotor obstacle avoidance in dense, cluttered environments, with comparison to various state-of-the-art architectures. Quadrotor unmanned aerial vehicles (UAVs) have tremendous maneuverability when flown fast; however, as flight speed increases, traditional vision-based navigation via independent mapping, planning, and control modules breaks down due to increased sensor noise, compounding errors, and increased processing latency. Thus, learning-based, end-to-end planning and control networks have shown to be effective for online control of these fast robots through cluttered environments. We train and compare convolutional, U-Net, and recurrent architectures against vision transformer models for depth-based end-to-end control, in a photorealistic, high-physics-fidelity simulator as well as in hardware, and observe that the attention-based models are more effective as quadrotor speeds increase, while recurrent models with many layers provide smoother commands at lower speeds. To the best of our knowledge, this is the first work to utilize vision transformers for end-to-end vision-based quadrotor control.
- [16] arXiv:2405.10392 [pdf, ps, html, other]
-
Title: Transport based particle methods for the Fokker-Planck-Landau equationComments: 26 pages, 6 figures, code this https URLSubjects: Numerical Analysis (math.NA); Machine Learning (cs.LG); Analysis of PDEs (math.AP)
We propose a particle method for numerically solving the Landau equation, inspired by the score-based transport modeling (SBTM) method for the Fokker-Planck equation. This method can preserve some important physical properties of the Landau equation, such as the conservation of mass, momentum, and energy, and decay of estimated entropy. We prove that matching the gradient of the logarithm of the approximate solution is enough to recover the true solution to the Landau equation with Maxwellian molecules. Several numerical experiments in low and moderately high dimensions are performed, with particular emphasis on comparing the proposed method with the traditional particle or blob method.
- [17] arXiv:2405.10398 [pdf, ps, html, other]
-
Title: Drone-type-Set: Drone types detection benchmark for drone detection and trackingSubjects: Computer Vision and Pattern Recognition (cs.CV)
The Unmanned Aerial Vehicles (UAVs) market has been significantly growing and Considering the availability of drones at low-cost prices the possibility of misusing them, for illegal purposes such as drug trafficking, spying, and terrorist attacks posing high risks to national security, is rising. Therefore, detecting and tracking unauthorized drones to prevent future attacks that threaten lives, facilities, and security, become a necessity. Drone detection can be performed using different sensors, while image-based detection is one of them due to the development of artificial intelligence techniques. However, knowing unauthorized drone types is one of the challenges due to the lack of drone types datasets. For that, in this paper, we provide a dataset of various drones as well as a comparison of recognized object detection models on the proposed dataset including YOLO algorithms with their different versions, like, v3, v4, and v5 along with the Detectronv2. The experimental results of different models are provided along with a description of each method. The collected dataset can be found in this https URL
- [18] arXiv:2405.10402 [pdf, ps, other]
-
Title: Formulae and transformations for simplicial tensorial finite elements via polytopal templatesSubjects: Numerical Analysis (math.NA)
We introduce a unified method for constructing the basis functions of a wide variety of partially continuous tensor-valued finite elements on simplices using polytopal templates. These finite element spaces are essential for achieving well-posed discretisations of mixed formulations of partial differential equations that involve tensor-valued functions, such as the Hellinger-Reissner formulation of linear elasticity. In our proposed polytopal template method, the basis functions are constructed from template tensors associated with the geometric polytopes (vertices, edges, faces etc.) of the reference simplex and any scalar-valued $H^1$-conforming finite element space. From this starting point we can construct the Regge, Hellan-Herrmann-Johnson, Pechstein-Schöberl, Hu-Zhang, Hu-Ma-Sun and Gopalakrishnan-Lederer-Schöberl elements. Because the Hu-Zhang element and the Hu-Ma-Sun element cannot be mapped from the reference simplex to a physical simplex via standard double Piola mappings, we also demonstrate that the polytopal template tensors can be used to define a consistent mapping from a reference simplex even to a non-affine simplex in the physical mesh. Finally, we discuss the implications of element regularity with two numerical examples for the Reissner-Mindlin plate problem.
- [19] arXiv:2405.10410 [pdf, ps, html, other]
-
Title: The fast committor machine: Interpretable prediction with kernelsComments: 10 pages, 7 figuresSubjects: Numerical Analysis (math.NA); Machine Learning (stat.ML)
In the study of stochastic dynamics, the committor function describes the probability that a process starting from an initial configuration $x$ will reach set $A$ before set $B$. This paper introduces a fast and interpretable method for approximating the committor, called the "fast committor machine" (FCM). The FCM is based on simulated trajectory data, and it uses this data to train a kernel model. The FCM identifies low-dimensional subspaces that optimally describe the $A$ to $B$ transitions, and the subspaces are emphasized in the kernel model. The FCM uses randomized numerical linear algebra to train the model with runtime that scales linearly in the number of data points. This paper applies the FCM to example systems including the alanine dipeptide miniprotein: in these experiments, the FCM is generally more accurate and trains more quickly than a neural network with a similar number of parameters.
- [20] arXiv:2405.10421 [pdf, ps, html, other]
-
Title: Pointwise Metrics for Clustering EvaluationSubjects: Information Retrieval (cs.IR)
This paper defines pointwise clustering metrics, a collection of metrics for characterizing the similarity of two clusterings. These metrics have several interesting properties which make them attractive for practical applications. They can take into account the relative importance of the various items that are clustered. The metric definitions are based on standard set-theoretic notions and are simple to understand. They characterize aspects that are important for typical applications, such as cluster homogeneity and completeness. It is possible to assign metrics to individual items, clusters, arbitrary slices of items, and the overall clustering. The metrics can provide deep insights, for example they can facilitate drilling deeper into clustering mistakes to understand where they happened, or help to explore slices of items to understand how they were affected. Since the pointwise metrics are mathematically well-behaved, they can provide a strong foundation for a variety of clustering evaluation techniques. In this paper we discuss in depth how the pointwise metrics can be used to evaluate an actual clustering with respect to a ground truth clustering.
- [21] arXiv:2405.10422 [pdf, ps, html, other]
-
Title: A First Look at Immersive Telepresence on Apple Vision ProSubjects: Networking and Internet Architecture (cs.NI)
Due to the widespread adoption of "work-from-home" policies, videoconferencing applications (e.g., Zoom) have become indispensable for remote communication. However, these systems lack immersiveness, leading to the so-called "Zoom fatigue" and degrading communication efficiency. The recent debut of Apple Vision Pro, a mixed reality headset that supports "spatial persona", aims to offer an immersive telepresence experience with these applications. In this paper, we conduct a first-of-its-kind in-depth and empirical study to analyze the performance of immersive telepresence with four applications, Apple FaceTime, Cisco Webex, Microsoft Teams, and Zoom, on Vision Pro. We find that only FaceTime provides a truly immersive experience with spatial personas, whereas other applications still operate 2D personas. Our measurement results reveal that (1) FaceTime delivers semantic information to optimize bandwidth consumption, which is even lower than that of 2D persona for other applications, and (2) it employs visibility-aware optimizations to reduce rendering overhead. However, the scalability of FaceTime remains limited, with a simple server allocation strategy that potentially leads to high network delay among users.
- [22] arXiv:2405.10423 [pdf, ps, html, other]
-
Title: Diversity-Aware Sign Language Production through a Pose Encoding Variational AutoencoderSubjects: Computer Vision and Pattern Recognition (cs.CV)
This paper addresses the problem of diversity-aware sign language production, where we want to give an image (or sequence) of a signer and produce another image with the same pose but different attributes (\textit{e.g.} gender, skin color). To this end, we extend the variational inference paradigm to include information about the pose and the conditioning of the attributes. This formulation improves the quality of the synthesised images. The generator framework is presented as a UNet architecture to ensure spatial preservation of the input pose, and we include the visual features from the variational inference to maintain control over appearance and style. We generate each body part with a separate decoder. This architecture allows the generator to deliver better overall results. Experiments on the SMILE II dataset show that the proposed model performs quantitatively better than state-of-the-art baselines regarding diversity, per-pixel image quality, and pose estimation. Quantitatively, it faithfully reproduces non-manual features for signers.
- [23] arXiv:2405.10425 [pdf, ps, html, other]
-
Title: Data Selection for Transfer UnlearningSubjects: Machine Learning (cs.LG)
As deep learning models are becoming larger and data-hungrier, there are growing ethical, legal and technical concerns over use of data: in practice, agreements on data use may change over time, rendering previously-used training data impermissible for training purposes. These issues have driven increased attention to machine unlearning: removing "the influence of" a subset of training data from a trained model. In this work, we advocate for a relaxed definition of unlearning that does not address privacy applications but targets a scenario where a data owner withdraws permission of use of their data for training purposes. In this context, we consider the important problem of \emph{transfer unlearning} where a pretrained model is transferred to a target dataset that contains some "non-static" data that may need to be unlearned in the future. We propose a new method that uses a mechanism for selecting relevant examples from an auxiliary "static" dataset, and finetunes on the selected data instead of "non-static" target data; addressing all unlearning requests ahead of time. We also adapt a recent relaxed definition of unlearning to our problem setting and demonstrate that our approach is an exact transfer unlearner according to it, while being highly efficient (amortized). We find that our method outperforms the gold standard "exact unlearning" (finetuning on only the "static" portion of the target dataset) on several datasets, especially for small "static" sets, sometimes approaching an upper bound for test accuracy. We also analyze factors influencing the accuracy boost obtained by data selection.
- [24] arXiv:2405.10426 [pdf, ps, html, other]
-
Title: Memory-efficient Energy-adaptive Inference of Pre-Trained Models on Batteryless Embedded SystemsPietro Farina, Subrata Biswas, Eren Yıldız, Khakim Akhunov, Saad Ahmed, Bashima Islam, Kasım Sinan YıldırımComments: This paper has been selected for publication at the 21st International Conference on Embedded Wireless Systems and Networks (EWSN'24)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Batteryless systems frequently face power failures, requiring extra runtime buffers to maintain inference progress and leaving only a memory space for storing ultra-tiny deep neural networks (DNNs). Besides, making these models responsive to stochastic energy harvesting dynamics during inference requires a balance between inference accuracy, latency, and energy overhead. Recent works on compression mostly focus on time and memory, but often ignore energy dynamics or significantly reduce the accuracy of pre-trained DNNs. Existing energy-adaptive inference works modify the architecture of pre-trained models and have significant memory overhead. Thus, energy-adaptive and accurate inference of pre-trained DNNs on batteryless devices with extreme memory constraints is more challenging than traditional microcontrollers. We combat these issues by proposing FreeML, a framework to optimize pre-trained DNN models for memory-efficient and energy-adaptive inference on batteryless systems. FreeML comprises (1) a novel compression technique to reduce the model footprint and runtime memory requirements simultaneously, making them executable on extremely memory-constrained batteryless platforms; and (2) the first early exit mechanism that uses a single exit branch for all exit points to terminate inference at any time, making models energy-adaptive with minimal memory overhead. Our experiments showed that FreeML reduces the model sizes by up to $95 \times$, supports adaptive inference with a $2.03-19.65 \times$ less memory overhead, and provides significant time and energy benefits with only a negligible accuracy drop compared to the state-of-the-art.
- [25] arXiv:2405.10429 [pdf, ps, html, other]
-
Title: Physics-Guided State-Space Model Augmentation Using Weighted Regularized Neural NetworksSubjects: Systems and Control (eess.SY)
Physics-guided neural networks (PGNN) is an effective tool that combines the benefits of data-driven modeling with the interpretability and generalization of underlying physical information. However, for a classical PGNN, the penalization of the physics-guided part is at the output level, which leads to a conservative result as systems with highly similar state-transition functions, i.e. only slight differences in parameters, can have significantly different time-series outputs. Furthermore, the classical PGNN cost function regularizes the model estimate over the entire state space with a constant trade-off hyperparameter. In this paper, we introduce a novel model augmentation strategy for nonlinear state-space model identification based on PGNN, using a weighted function regularization (W-PGNN). The proposed approach can efficiently augment the prior physics-based state-space models based on measurement data. A new weighted regularization term is added to the cost function to penalize the difference between the state and output function of the baseline physics-based and final identified model. This ensures the estimated model follows the baseline physics model functions in regions where the data has low information content, while placing greater trust in the data when a high informativity is present. The effectiveness of the proposed strategy over the current PGNN method is demonstrated on a benchmark example.
- [26] arXiv:2405.10431 [pdf, ps, html, other]
-
Title: Thinking Fair and Slow: On the Efficacy of Structured Prompts for Debiasing Language ModelsShaz Furniturewala, Surgan Jandial, Abhinav Java, Pragyan Banerjee, Simra Shahid, Sumit Bhatia, Kokil JaidkaComments: The first two authors have equal contributionSubjects: Computation and Language (cs.CL)
Existing debiasing techniques are typically training-based or require access to the model's internals and output distributions, so they are inaccessible to end-users looking to adapt LLM outputs for their particular needs. In this study, we examine whether structured prompting techniques can offer opportunities for fair text generation. We evaluate a comprehensive end-user-focused iterative framework of debiasing that applies System 2 thinking processes for prompts to induce logical, reflective, and critical text generation, with single, multi-step, instruction, and role-based variants. By systematically evaluating many LLMs across many datasets and different prompting strategies, we show that the more complex System 2-based Implicative Prompts significantly improve over other techniques demonstrating lower mean bias in the outputs with competitive performance on the downstream tasks. Our work offers research directions for the design and the potential of end-user-focused evaluative frameworks for LLM use.
- [27] arXiv:2405.10435 [pdf, ps, html, other]
-
Title: Two-Stage Stochastic Optimal Power Flow for Microgrids With Uncertain Wildfire EffectsSubjects: Systems and Control (eess.SY)
Large-scale power outages caused by extreme weather events are one of the major factors weakening grid resilience. In order to prevent the critical infrastructure from cascading failure, power lines are often proactively de-energized under the threat of a progressing wildfire. In this context, the potential of microgrid (MG) functioning in islanded mode can be exploited to enhance the resiliency of the power grid. However, there are numerous uncertainties originating from these types of events and an accurate modeling of the MG is required to harness its full potential. In this paper, we consider the uncertainty in line outages depending on fire propagation and reduced solar power generation due to the particulate matter in wildfire smoke. We formulate a two-stage stochastic MG optimal power flow problem by utilizing a second-order cone relaxation of the DistFlow model. Leveraging an effective approximation of the resistive heat gain, we separate the complicating constraints of dynamic line rating from the resulting optimization problem. Extensive simulation results corroborate the merits of our proposed framework, which is tested on a modified IEEE 22-bus system.
- [28] arXiv:2405.10436 [pdf, ps, html, other]
-
Title: Positional encoding is not the same as context: A study on positional encoding for Sequential recommendationComments: 19 pages, 3 figures, 12 tablesSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI)
The expansion of streaming media and e-commerce has led to a boom in recommendation systems, including Sequential recommendation systems, which consider the user's previous interactions with items. In recent years, research has focused on architectural improvements such as transformer blocks and feature extraction that can augment model information. Among these features are context and attributes. Of particular importance is the temporal footprint, which is often considered part of the context and seen in previous publications as interchangeable with positional information. Other publications use positional encodings with little attention to them. In this paper, we analyse positional encodings, showing that they provide relative information between items that are not inferable from the temporal footprint. Furthermore, we evaluate different encodings and how they affect metrics and stability using Amazon datasets. We added some new encodings to help with these problems along the way. We found that we can reach new state-of-the-art results by finding the correct positional encoding, but more importantly, certain encodings stabilise the training.
- [29] arXiv:2405.10439 [pdf, ps, html, other]
-
Title: Beyond Traditional Single Object Tracking: A SurveySubjects: Computer Vision and Pattern Recognition (cs.CV)
Single object tracking is a vital task of many applications in critical fields. However, it is still considered one of the most challenging vision tasks. In recent years, computer vision, especially object tracking, witnessed the introduction or adoption of many novel techniques, setting new fronts for performance. In this survey, we visit some of the cutting-edge techniques in vision, such as Sequence Models, Generative Models, Self-supervised Learning, Unsupervised Learning, Reinforcement Learning, Meta-Learning, Continual Learning, and Domain Adaptation, focusing on their application in single object tracking. We propose a novel categorization of single object tracking methods based on novel techniques and trends. Also, we conduct a comparative analysis of the performance reported by the methods presented on popular tracking benchmarks. Moreover, we analyze the pros and cons of the presented approaches and present a guide for non-traditional techniques in single object tracking. Finally, we suggest potential avenues for future research in single-object tracking.
- [30] arXiv:2405.10440 [pdf, ps, html, other]
-
Title: Retrieving and Refining: A Hybrid Framework with Large Language Models for Rare Disease IdentificationSubjects: Computation and Language (cs.CL)
The infrequency and heterogeneity of clinical presentations in rare diseases often lead to underdiagnosis and their exclusion from structured datasets. This necessitates the utilization of unstructured text data for comprehensive analysis. However, the manual identification from clinical reports is an arduous and intrinsically subjective task. This study proposes a novel hybrid approach that synergistically combines a traditional dictionary-based natural language processing (NLP) tool with the powerful capabilities of large language models (LLMs) to enhance the identification of rare diseases from unstructured clinical notes. We comprehensively evaluate various prompting strategies on six large language models (LLMs) of varying sizes and domains (general and medical). This evaluation encompasses zero-shot, few-shot, and retrieval-augmented generation (RAG) techniques to enhance the LLMs' ability to reason about and understand contextual information in patient reports. The results demonstrate effectiveness in rare disease identification, highlighting the potential for identifying underdiagnosed patients from clinical notes.
- [31] arXiv:2405.10441 [pdf, ps, other]
-
Title: Trajectory tracking control of a Remotely Operated Underwater Vehicle based on Fuzzy Disturbance Adaptation and Controller Parameter OptimizationSubjects: Robotics (cs.RO); Systems and Control (eess.SY)
The exploration of under-ice environments presents unique challenges due to limited access for scientific research. This report investigates the potential of deploying a fully actuated Remotely Operated Vehicle (ROV) for shallow area exploration beneath ice sheets. Leveraging advancements in marine robotics technology, ROVs offer a promising solution for extending human presence into remote underwater locations. To enable successful under-ice exploration, the ROV must follow precise trajectories for effective localization signal reception. This study develops a multi-input-multi-output (MIMO) nonlinear system controller, incorporating a Lyapunov-based stability guarantee and an adaptation law to mitigate unknown environmental disturbances. Fuzzy logic is employed to dynamically adjust adaptation rates, enhancing performance in highly nonlinear ROV dynamic systems. Additionally, a Particle Swarm Optimization (PSO) algorithm automates the tuning of controller parameters for optimal trajectory tracking. The report details the ROV dynamic model, the proposed control framework, and the PSO-based tuning process. Simulation-based experiments validate the efficacy of the methodology, with experimental results demonstrating superior trajectory tracking performance compared to baseline controllers. This work contributes to the advancement of under-ice exploration capabilities and sets the stage for future research in marine robotics and autonomous underwater systems.
- [32] arXiv:2405.10443 [pdf, ps, html, other]
-
Title: Simultaneous Masking, Not Prompting Optimization: A Paradigm Shift in Fine-tuning LLMs for Simultaneous TranslationSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Large language models (LLMs) have achieved state-of-the-art performance in various language processing tasks, motivating their adoption in simultaneous translation. Current fine-tuning methods to adapt LLMs for simultaneous translation focus on prompting optimization strategies using either data augmentation or prompt structure modifications. However, these methods suffer from several issues, such as an unnecessarily expanded training set, computational inefficiency from dumping the KV cache, increased prompt sizes, or restriction to a single decision policy. To eliminate these issues, we propose a new paradigm in fine-tuning LLMs for simultaneous translation, called SimulMask. It utilizes a novel attention mask technique that models simultaneous translation during fine-tuning by masking attention connections under a desired decision policy. Applying the proposed SimulMask on a Falcon LLM for the IWSLT 2017 dataset, we have observed a significant translation quality improvement compared to state-of-the-art prompting optimization strategies on three language pairs when averaged across four different latency regimes while reducing the computational cost.
- [33] arXiv:2405.10444 [pdf, ps, html, other]
-
Title: A Novel Bounding Box Regression Method for Single Object TrackingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Locating an object in a sequence of frames, given its appearance in the first frame of the sequence, is a hard problem that involves many stages. Usually, state-of-the-art methods focus on bringing novel ideas in the visual encoding or relational modelling phases. However, in this work, we show that bounding box regression from learned joint search and template features is of high importance as well. While previous methods relied heavily on well-learned features representing interactions between search and template, we hypothesize that the receptive field of the input convolutional bounding box network plays an important role in accurately determining the object location. To this end, we introduce two novel bounding box regression networks: inception and deformable. Experiments and ablation studies show that our inception module installed on the recent ODTrack outperforms the latter on three benchmarks: the GOT-10k, the UAV123 and the OTB2015.
- [34] arXiv:2405.10446 [pdf, ps, html, other]
-
Title: Tell me more: Intent Fulfilment Framework for Enhancing User Experiences in Conversational XAISubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
The evolution of Explainable Artificial Intelligence (XAI) has emphasised the significance of meeting diverse user needs. The approaches to identifying and addressing these needs must also advance, recognising that explanation experiences are subjective, user-centred processes that interact with users towards a better understanding of AI decision-making. This paper delves into the interrelations in multi-faceted XAI and examines how different types of explanations collaboratively meet users' XAI needs. We introduce the Intent Fulfilment Framework (IFF) for creating explanation experiences. The novelty of this paper lies in recognising the importance of "follow-up" on explanations for obtaining clarity, verification and/or substitution. Moreover, the Explanation Experience Dialogue Model integrates the IFF and "Explanation Followups" to provide users with a conversational interface for exploring their explanation needs, thereby creating explanation experiences. Quantitative and qualitative findings from our comparative user study demonstrate the impact of the IFF in improving user engagement, the utility of the AI system and the overall user experience. Overall, we reinforce the principle that "one explanation does not fit all" to create explanation experiences that guide the complex interaction through conversation.
- [35] arXiv:2405.10447 [pdf, ps, html, other]
-
Title: Codes for Limited-Magnitude Probability Error in DNA StorageComments: Part of work is published in ICC 2022-IEEE International Conference on CommunicationsSubjects: Information Theory (cs.IT)
DNA, with remarkable properties of high density, durability, and replicability, is one of the most appealing storage media. Emerging DNA storage technologies use composite DNA letters, where information is represented by probability vectors, leading to higher information density and lower synthesizing costs than regular DNA letters. However, it faces the problem of inevitable noise and information corruption. This paper explores the channel of composite DNA letters in DNA-based storage systems and introduces block codes for limited-magnitude probability errors on probability vectors. First, outer and inner bounds for limited-magnitude probability error correction codes are provided. Moreover, code constructions are proposed where the number of errors is bounded by t, the error magnitudes are bounded by l, and the probability resolution is fixed as k. These constructions focus on leveraging the properties of limited-magnitude probability errors in DNA-based storage systems, leading to improved performance in terms of complexity and redundancy. In addition, the asymptotic optimality for one of the proposed constructions is established. Finally, systematic codes based on one of the proposed constructions are presented, which enable efficient information extraction for practical implementation.
- [36] arXiv:2405.10452 [pdf, ps, html, other]
-
Title: Navigating Public Sentiment in the Circular Economy through Topic Modelling and Hyperparameter OptimisationSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
To advance the circular economy (CE), it is crucial to gain insights into the evolution of public sentiments, cognitive pathways of the masses concerning circular products and digital technology, and recognise the primary concerns. To achieve this, we collected data related to the CE from diverse platforms including Twitter, Reddit, and The Guardian. This comprehensive data collection spanned across three distinct strata of the public: the general public, professionals, and official sources. Subsequently, we utilised three topic models on the collected data. Topic modelling represents a type of data-driven and machine learning approach for text mining, capable of automatically categorising a large number of documents into distinct semantic groups. Simultaneously, these groups are described by topics, and these topics can aid in understanding the semantic content of documents at a high level. However, the performance of topic modelling may vary depending on different hyperparameter values. Therefore, in this study, we proposed a framework for topic modelling with hyperparameter optimisation for CE and conducted a series of systematic experiments to ensure that topic models are set with appropriate hyperparameters and to gain insights into the correlations between the CE and public opinion based on well-established models. The results of this study indicate that concerns about sustainability and economic impact persist across all three datasets. Official sources demonstrate a higher level of engagement with the application and regulation of CE. To the best of our knowledge, this study is pioneering in investigating various levels of public opinions concerning CE through topic modelling with the exploration of hyperparameter optimisation.
- [37] arXiv:2405.10456 [pdf, ps, html, other]
-
Title: Region-level labels in ice charts can produce pixel-level segmentation for Sea Ice typesComments: Published at ICLR 2024 Machine Learning for Remote Sensing (ML4RS) WorkshopSubjects: Computer Vision and Pattern Recognition (cs.CV)
Fully supervised deep learning approaches have demonstrated impressive accuracy in sea ice classification, but their dependence on high-resolution labels presents a significant challenge due to the difficulty of obtaining such data. In response, our weakly supervised learning method provides a compelling alternative by utilizing lower-resolution regional labels from expert-annotated ice charts. This approach achieves exceptional pixel-level classification performance by introducing regional loss representations during training to measure the disparity between predicted and ice chart-derived sea ice type distributions. Leveraging the AI4Arctic Sea Ice Challenge Dataset, our method outperforms the fully supervised U-Net benchmark, the top solution of the AutoIce challenge, in both mapping resolution and class-wise accuracy, marking a significant advancement in automated operational sea ice mapping.
- [38] arXiv:2405.10457 [pdf, ps, html, other]
-
Title: Participle-Prepended Nominals Have Lower Entropy Than Nominals Appended After the ParticipleComments: Accepted to CogSci 2024, 6 pages, 2 figuresSubjects: Computation and Language (cs.CL)
English allows for both compounds (e.g., London-made) and phrasal paraphrases (e.g., made in London). While these constructions have roughly the same truth-conditional meaning, we hypothesize that the compound allows less freedom to express the nature of the semantic relationship between the participle and the pre-participle nominal. We thus predict that the pre-participle slot is more constrained than the equivalent position in the phrasal construction. We test this prediction in a large corpus by measuring the entropy of corresponding nominal slots, conditional on the participle used. That is, we compare the entropy of $\alpha$ in compound construction slots like $\alpha$-[V]ed to the entropy of $\alpha$ in phrasal constructions like [V]ed by $\alpha$ for a given verb V. As predicted, there is significantly lower entropy in the compound construction than in the phrasal construction. We consider how these predictions follow from more general grammatical properties and processing factors.
- [39] arXiv:2405.10460 [pdf, ps, html, other]
-
Title: The AI Collaborator: Bridging Human-AI Interaction in Educational and Professional SettingsSubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
AI Collaborator, powered by OpenAI's GPT-4, is a groundbreaking tool designed for human-AI collaboration research. Its standout feature is the ability for researchers to create customized AI personas for diverse experimental setups using a user-friendly interface. This functionality is essential for simulating various interpersonal dynamics in team settings. AI Collaborator excels in mimicking different team behaviors, enabled by its advanced memory system and a sophisticated personality framework. Researchers can tailor AI personas along a spectrum from dominant to cooperative, enhancing the study of their impact on team processes. The tool's modular design facilitates integration with digital platforms like Slack, making it versatile for various research scenarios. AI Collaborator is thus a crucial resource for exploring human-AI team dynamics more profoundly.
- [40] arXiv:2405.10465 [pdf, ps, other]
-
Title: Error Analysis of Randomized Symplectic Model Order Reduction for Hamiltonian systemsComments: 27 pages, 4 figuresSubjects: Numerical Analysis (math.NA)
Solving high-dimensional dynamical systems in multi-query or real-time applications requires efficient surrogate modelling techniques, as e.g., achieved via model order reduction (MOR). If these systems are Hamiltonian systems their physical structure should be preserved during the reduction, which can be ensured by applying symplectic basis generation techniques such as the complex SVD (cSVD). Recently, randomized symplectic methods such as the randomized complex singular value decomposition (rcSVD) have been developed for a more efficient computation of symplectic bases that preserve the Hamiltonian structure during MOR. In the current paper, we present two error bounds for the rcSVD basis depending on the choice of hyperparameters and show that with a proper choice of hyperparameters, the projection error of rcSVD is at most a constant factor worse than the projection error of cSVD. We provide numerical experiments that demonstrate the efficiency of randomized symplectic basis generation and compare the bounds numerically.
- [41] arXiv:2405.10467 [pdf, ps, html, other]
-
Title: Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based AgentsSubjects: Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
Foundation model-enabled generative artificial intelligence facilitates the development and implementation of agents, which can leverage distinguished reasoning and language processing capabilities to takes a proactive, autonomous role to pursue users' goals. Nevertheless, there is a lack of systematic knowledge to guide practitioners in designing the agents considering challenges of goal-seeking (including generating instrumental goals and plans), such as hallucinations inherent in foundation models, explainability of reasoning process, complex accountability, etc. To address this issue, we have performed a systematic literature review to understand the state-of-the-art foundation model-based agents and the broader ecosystem. In this paper, we present a pattern catalogue consisting of 16 architectural patterns with analyses of the context, forces, and trade-offs as the outcomes from the previous literature review. The proposed catalogue can provide holistic guidance for the effective use of patterns, and support the architecture design of foundation model-based agents by facilitating goal-seeking and plan generation.
- [42] arXiv:2405.10469 [pdf, ps, html, other]
-
Title: Simulation-Based Benchmarking of Reinforcement Learning Agents for Personalized Retail PromotionsSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Econometrics (econ.EM); Machine Learning (stat.ML)
The development of open benchmarking platforms could greatly accelerate the adoption of AI agents in retail. This paper presents comprehensive simulations of customer shopping behaviors for the purpose of benchmarking reinforcement learning (RL) agents that optimize coupon targeting. The difficulty of this learning problem is largely driven by the sparsity of customer purchase events. We trained agents using offline batch data comprising summarized customer purchase histories to help mitigate this effect. Our experiments revealed that contextual bandit and deep RL methods that are less prone to over-fitting the sparse reward distributions significantly outperform static policies. This study offers a practical framework for simulating AI agents that optimize the entire retail customer journey. It aims to inspire the further development of simulation tools for retail AI systems.
- [43] arXiv:2405.10473 [pdf, ps, html, other]
-
Title: Infrastructure Engineering: A Still Missing, Undervalued Role in the Research EcosystemComments: 13 pages, 1 figure, 1 tableSubjects: Software Engineering (cs.SE)
Research has become increasingly reliant on software, serving as the driving force behind bioinformatics, high performance computing, physics, machine learning and artificial intelligence, to name a few. While substantial progress has been made in advocating for the research software engineer, a kind of software engineer that typically works directly on software and associated assets that go into research, little attention has been placed on the workforce behind research infrastructure and innovation, namely compilers and compatibility tool development, orchestration and scheduling infrastructure, developer environments, container technologies, and workflow managers. As economic incentives are moving toward different models of cloud computing and innovating is required to develop new paradigms that represent the best of both worlds, an effort called "converged computing," the need for such a role is not just ideal, but essential for the continued success of science. While scattered staff in non-traditional roles have found time to work on some facets of this space, the lack of a larger workforce and incentive to support it has led to the scientific community falling behind. In this article we will highlight the importance of this missing layer, providing examples of how a missing role of infrastructure engineer has led to inefficiencies in the interoperability, portability, and reproducibility of science. We suggest that an inability to allocate, provide resources for, and sustain individuals to work explicitly on these technologies could lead to possible futures that are sub-optimal for the continued success of our scientific communities.
- [44] arXiv:2405.10474 [pdf, ps, html, other]
-
Title: Rethinking ChatGPT's Success: Usability and Cognitive Behaviors Enabled by Auto-regressive LLMs' PromptingSubjects: Computation and Language (cs.CL)
Over the last decade, a wide range of training and deployment strategies for Large Language Models (LLMs) have emerged. Among these, the prompting paradigms of Auto-regressive LLMs (AR-LLMs) have catalyzed a significant surge in Artificial Intelligence (AI). This paper aims to emphasize the significance of utilizing free-form modalities (forms of input and output) and verbal free-form contexts as user-directed channels (methods for transforming modalities) for downstream deployment. Specifically, we analyze the structure of modalities within both two types of LLMs and six task-specific channels during deployment. From the perspective of users, our analysis introduces and applies the analytical metrics of task customizability, transparency, and complexity to gauge their usability, highlighting the superior nature of AR-LLMs' prompting paradigms. Moreover, we examine the stimulation of diverse cognitive behaviors in LLMs through the adoption of free-form text and verbal contexts, mirroring human linguistic expressions of such behaviors. We then detail four common cognitive behaviors to underscore how AR-LLMs' prompting successfully imitate human-like behaviors using this free-form modality and channel. Lastly, the potential for improving LLM deployment, both as autonomous agents and within multi-agent systems, is identified via cognitive behavior concepts and principles.
- [45] arXiv:2405.10476 [pdf, ps, html, other]
-
Title: Analysis, Modeling and Design of Personalized Digital Learning EnvironmentComments: IEEE Trans on Education, 2024Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Software Engineering (cs.SE)
This research analyzes, models and develops a novel Digital Learning Environment (DLE) fortified by the innovative Private Learning Intelligence (PLI) framework. The proposed PLI framework leverages federated machine learning (FL) techniques to autonomously construct and continuously refine personalized learning models for individual learners, ensuring robust privacy protection. Our approach is pivotal in advancing DLE capabilities, empowering learners to actively participate in personalized real-time learning experiences. The integration of PLI within a DLE also streamlines instructional design and development demands for personalized teaching/learning. We seek ways to establish a foundation for the seamless integration of FL into learning systems, offering a transformative approach to personalized learning in digital environments. Our implementation details and code are made public.
- [46] arXiv:2405.10478 [pdf, ps, other]
-
Title: GridapTopOpt.jl: A scalable Julia toolbox for level set-based topology optimisationSubjects: Mathematical Software (cs.MS)
In this paper we present GridapTopOpt, an extendable framework for level set-based topology optimisation that can be readily distributed across a personal computer or high-performance computing cluster. The package is written in Julia and uses the Gridap package ecosystem for parallel finite element assembly from arbitrary weak formulations of partial differential equation (PDEs) along with the scalable solvers from the Portable and Extendable Toolkit for Scientific Computing (PETSc). The resulting user interface is intuitive and easy-to-use, allowing for the implementation of a wide range of topology optimisation problems with a syntax that is near one-to-one with the mathematical notation. Furthermore, we implement automatic differentiation to help mitigate the bottleneck associated with the analytic derivation of sensitivities for complex problems. GridapTopOpt is capable of solving a range of benchmark and research topology optimisation problems with large numbers of degrees of freedom. This educational article demonstrates the usability and versatility of the package by describing the formulation and step-by-step implementation of several distinct topology optimisation problems. The driver scripts for these problems are provided and the package source code is available at https://github$.$com/zjwegert/GridapTopOpt.jl.
- [47] arXiv:2405.10479 [pdf, ps, html, other]
-
Title: Convexification for a Coefficient Inverse Problem for a System of Two Coupled Nonlinear Parabolic EquationsComments: arXiv admin note: text overlap with arXiv:2310.08878Subjects: Numerical Analysis (math.NA)
A system of two coupled nonlinear parabolic partial differential equations with two opposite directions of time is considered. In fact, this is the so-called "Mean Field Games System" (MFGS), which is derived in the mean field games (MFG) theory. This theory has numerous applications in social sciences. The topic of Coefficient Inverse Problems (CIPs) in the MFG theory is in its infant age, both in theory and computations. A numerical method for this CIP is developed. Convergence analysis ensures the global convergence of this method. Numerical experiments are presented.
- [48] arXiv:2405.10480 [pdf, ps, html, other]
-
Title: Lean Attention: Hardware-Aware Scalable Attention Mechanism for the Decode-Phase of TransformersComments: 13 pages, 10 figuresSubjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Transformer-based models have emerged as one of the most widely used architectures for natural language processing, natural language generation, and image generation. The size of the state-of-the-art models has increased steadily reaching billions of parameters. These huge models are memory hungry and incur significant inference latency even on cutting edge AI-accelerators, such as GPUs. Specifically, the time and memory complexity of the attention operation is quadratic in terms of the total context length, i.e., prompt and output tokens. Thus, several optimizations such as key-value tensor caching and FlashAttention computation have been proposed to deliver the low latency demands of applications relying on such large models. However, these techniques do not cater to the computationally distinct nature of different phases during inference.
To that end, we propose LeanAttention, a scalable technique of computing self-attention for the token-generation phase (decode-phase) of decoder-only transformer models. LeanAttention enables scaling the attention mechanism implementation for the challenging case of long context lengths by re-designing the execution flow for the decode-phase. We identify that the associative property of online softmax can be treated as a reduction operation thus allowing us to parallelize the attention computation over these large context lengths. We extend the "stream-K" style reduction of tiled calculation to self-attention to enable parallel computation resulting in an average of 2.6x attention execution speedup over FlashAttention-2 and up to 8.33x speedup for 512k context lengths. - [49] arXiv:2405.10481 [pdf, ps, html, other]
-
Title: Multi-Evidence based Fact Verification via A Confidential Graph Neural NetworkComments: 12pagesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Fact verification tasks aim to identify the integrity of textual contents according to the truthful corpus. Existing fact verification models usually build a fully connected reasoning graph, which regards claim-evidence pairs as nodes and connects them with edges. They employ the graph to propagate the semantics of the nodes. Nevertheless, the noisy nodes usually propagate their semantics via the edges of the reasoning graph, which misleads the semantic representations of other nodes and amplifies the noise signals. To mitigate the propagation of noisy semantic information, we introduce a Confidential Graph Attention Network (CO-GAT), which proposes a node masking mechanism for modeling the nodes. Specifically, CO-GAT calculates the node confidence score by estimating the relevance between the claim and evidence pieces. Then, the node masking mechanism uses the node confidence scores to control the noise information flow from the vanilla node to the other graph nodes. CO-GAT achieves a 73.59% FEVER score on the FEVER dataset and shows the generalization ability by broadening the effectiveness to the science-specific domain.
- [50] arXiv:2405.10485 [pdf, ps, other]
-
Title: CNER: A tool Classifier of Named-Entity RelationshipsSubjects: Computation and Language (cs.CL); Human-Computer Interaction (cs.HC)
We introduce CNER, an ensemble of capable tools for extraction of semantic relationships between named entities in Spanish language. Built upon a container-based architecture, CNER integrates different Named entity recognition and relation extraction tools with a user-friendly interface that allows users to input free text or files effortlessly, facilitating streamlined analysis. Developed as a prototype version for the Natural Language Processing (NLP) Group at Universidad del Valle, CNER serves as a practical educational resource, illustrating how machine learning techniques can effectively tackle diverse NLP tasks in Spanish. Our preliminary results reveal the promising potential of CNER in advancing the understanding and development of NLP tools, particularly within Spanish-language contexts.
- [51] arXiv:2405.10489 [pdf, ps, html, other]
-
Title: MixCut:A Data Augmentation Method for Facial Expression RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV)
In the facial expression recognition task, researchers always get low accuracy of expression classification due to a small amount of training samples. In order to solve this kind of problem, we proposes a new data augmentation method named MixCut. In this method, we firstly interpolate the two original training samples at the pixel level in a random ratio to generate new samples. Then, pixel removal is performed in random square regions on the new samples to generate the final training samples. We evaluated the MixCut method on Fer2013Plus and RAF-DB. With MixCut, we achieved 85.63% accuracy in eight-label classification on Fer2013Plus and 87.88% accuracy in seven-label classification on RAF-DB, effectively improving the classification accuracy of facial expression image recognition. Meanwhile, on Fer2013Plus, MixCut achieved performance improvements of +0.59%, +0.36%, and +0.39% compared to the other three data augmentation methods: CutOut, Mixup, and CutMix, respectively. MixCut improves classification accuracy on RAF-DB by +0.22%, +0.65%, and +0.5% over these three data augmentation methods.
- [52] arXiv:2405.10492 [pdf, ps, other]
-
Title: Automatic News Generation and Fact-Checking System Based on Language ProcessingXirui Peng, Qiming Xu, Zheng Feng, Haopeng Zhao, Lianghao Tan, Yan Zhou, Zecheng Zhang, Chenwei Gong, Yingqiao ZhengSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
This paper explores an automatic news generation and fact-checking system based on language processing, aimed at enhancing the efficiency and quality of news production while ensuring the authenticity and reliability of the news content. With the rapid development of Natural Language Processing (NLP) and deep learning technologies, automatic news generation systems are capable of extracting key information from massive data and generating well-structured, fluent news articles. Meanwhile, by integrating fact-checking technology, the system can effectively prevent the spread of false news and improve the accuracy and credibility of news. This study details the key technologies involved in automatic news generation and factchecking, including text generation, information extraction, and the application of knowledge graphs, and validates the effectiveness of these technologies through experiments. Additionally, the paper discusses the future development directions of automatic news generation and fact-checking systems, emphasizing the importance of further integration and innovation of technologies. The results show that with continuous technological optimization and practical application, these systems will play an increasingly important role in the future news industry, providing more efficient and reliable news services.
- [53] arXiv:2405.10496 [pdf, ps, html, other]
-
Title: Electromagnetic Information Theory for Holographic MIMO CommunicationsLi Wei, Tierui Gong, Chongwen Huang, Zhaoyang Zhang, Wei E. I. Sha, Zhi Ning Chen, Linglong Dai, Merouane Debbah, Chau YuenSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Holographic multiple-input multiple-output (HMIMO) utilizes a compact antenna array to form a nearly continuous aperture, thereby enhancing higher capacity and more flexible configurations compared with conventional MIMO systems, making it attractive in current scientific research. Key questions naturally arise regarding the potential of HMIMO to surpass Shannon's theoretical limits and how far its capabilities can be extended. However, the traditional Shannon information theory falls short in addressing these inquiries because it only focuses on the information itself while neglecting the underlying carrier, electromagnetic (EM) waves, and environmental interactions. To fill up the gap between the theoretical analysis and the practical application for HMIMO systems, we introduce electromagnetic information theory (EIT) in this paper. This paper begins by laying the foundation for HMIMO-oriented EIT, encompassing EM wave equations and communication regions. In the context of HMIMO systems, the resultant physical limitations are presented, involving Chu's limit, Harrington's limit, Hannan's limit, and the evaluation of coupling effects. Field sampling and HMIMO-assisted oversampling are also discussed to guide the optimal HMIMO design within the EIT framework. To comprehensively depict the EM-compliant propagation process, we present the approximate and exact channel modeling approaches in near-/far-field zones. Furthermore, we discuss both traditional Shannon's information theory, employing the probabilistic method, and Kolmogorov information theory, utilizing the functional analysis, for HMIMO-oriented EIT systems.
- [54] arXiv:2405.10497 [pdf, ps, html, other]
-
Title: SMP Challenge: An Overview and Analysis of Social Media Prediction ChallengeComments: ACM Multimedia. arXiv admin note: text overlap with arXiv:1910.01795Subjects: Multimedia (cs.MM); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Social and Information Networks (cs.SI)
Social Media Popularity Prediction (SMPP) is a crucial task that involves automatically predicting future popularity values of online posts, leveraging vast amounts of multimodal data available on social media platforms. Studying and investigating social media popularity becomes central to various online applications and requires novel methods of comprehensive analysis, multimodal comprehension, and accurate prediction.
SMP Challenge is an annual research activity that has spurred academic exploration in this area. This paper summarizes the challenging task, data, and research progress. As a critical resource for evaluating and benchmarking predictive models, we have released a large-scale SMPD benchmark encompassing approximately half a million posts authored by around 70K users. The research progress analysis provides an overall analysis of the solutions and trends in recent years. The SMP Challenge website (this http URL) provides the latest information and news. - [55] arXiv:2405.10499 [pdf, ps, html, other]
-
Title: Predictive Monitoring with Strong Trace PrefixesSubjects: Programming Languages (cs.PL)
Runtime predictive analyses enhance coverage of traditional dynamic analyses based bug detection techniques by identifying a space of feasible reorderings of the observed execution and determining if any of these witnesses the violation of some desired safety property. The most popular approach for modelling the space of feasible reorderings is through Mazurkiewicz's trace equivalence. The simplicity of the framework also gives rise to efficient predictive analyses, and has been the de facto means for obtaining space and time efficient algorithms for monitoring concurrent programs. In this work, we investigate how to enhance the predictive power of trace-based reasoning, while still retaining the algorithmic benefits it offers. Towards this, we extend trace theory by naturally embedding a class of prefixes, which we call strong trace prefixes. We formally characterize strong trace prefixes using an enhanced dependence relation, study its predictive power and establish a tight connection to the previously proposed notion of synchronization preserving correct reorderings developed in the context of data race and deadlock prediction. We then show that despite the enhanced predictive power, strong trace prefixes continue to enjoy the algorithmic benefits of Mazurkiewicz traces in the context of prediction against co-safety properties, and derive new algorithms for synchronization preserving data races and deadlocks with better asymptotic space and time usage. We also show that strong trace prefixes can capture more violations of pattern languages. We implement our proposed algorithms and our evaluation confirms the practical utility of reasoning based on strong prefix traces.
- [56] arXiv:2405.10502 [pdf, ps, html, other]
-
Title: Enhancing DMI Interactions by Integrating Haptic Feedback for Intricate Vibrato TechniqueSubjects: Human-Computer Interaction (cs.HC); Sound (cs.SD)
This paper investigates the integration of force feedback in Digital Musical Instruments (DMI), specifically evaluating the reproduction of intricate vibrato techniques using haptic feedback controllers. We introduce our system for vibrato modulation using force feedback, composed of Bend-aid (a web-based sequencer platform using pre-designed haptic feedback models) and TorqueTuner (an open-source 1 Degree-of-Freedom (DoF) rotary haptic device for generating programmable haptic effects). We designed a formal user study to assess the impact of each haptic mode on user experience in a vibrato mimicry task. Twenty musically trained participants rated their user experience for the three haptic modes (Smooth, Detent, and Spring) using four Likert-scale scores: comfort, flexibility, ease of control, and helpfulness for the task. Finally, we asked participants to share their reflections. Our research indicates that while the Spring mode can help with light vibrato, preferences for haptic modes vary based on musical training background. This emphasizes the need for adaptable task interfaces and flexible haptic feedback in DMI design.
- [57] arXiv:2405.10504 [pdf, ps, other]
-
Title: Multi-scale Semantic Prior Features Guided Deep Neural Network for Urban Street-view ImageSubjects: Computer Vision and Pattern Recognition (cs.CV)
Street-view image has been widely applied as a crucial mobile mapping data source. The inpainting of street-view images is a critical step for street-view image processing, not only for the privacy protection, but also for the urban environment mapping applications. This paper presents a novel Deep Neural Network (DNN), multi-scale semantic prior Feature guided image inpainting Network (MFN) for inpainting street-view images, which generate static street-view images without moving objects (e.g., pedestrians, vehicles). To enhance global context understanding, a semantic prior prompter is introduced to learn rich semantic priors from large pre-trained model. We design the prompter by stacking multiple Semantic Pyramid Aggregation (SPA) modules, capturing a broad range of visual feature patterns. A semantic-enhanced image generator with a decoder is proposed that incorporates a novel cascaded Learnable Prior Transferring (LPT) module at each scale level. For each decoder block, an attention transfer mechanism is applied to capture long-term dependencies, and the semantic prior features are fused with the image features to restore plausible structure in an adaptive manner. Additionally, a background-aware data processing scheme is adopted to prevent the generation of hallucinated objects within holes. Experiments on Apolloscapes and Cityscapes datasets demonstrate better performance than state-of-the-art methods, with MAE, and LPIPS showing improvements of about 9.5% and 41.07% respectively. Visual comparison survey among multi-group person is also conducted to provide performance evaluation, and the results suggest that the proposed MFN offers a promising solution for privacy protection and generate more reliable scene for urban applications with street-view images.
- [58] arXiv:2405.10505 [pdf, ps, html, other]
-
Title: Local Time-Stepping for the Shallow Water Equations using CFL Optimized Forward-Backward Runge-Kutta SchemesSubjects: Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
The Courant-Friedrichs-Lewy (CFL) condition is a well known, necessary condition for the stability of explicit time-stepping schemes that effectively places a limit on the size of the largest admittable time-step for a given problem. We formulate and present a new local time-stepping (LTS) scheme optimized, in the CFL sense, for the shallow water equations (SWEs). This new scheme, called FB-LTS, is based on the CFL optimized forward-backward Runge-Kutta schemes from Lilly et al. (2023). We show that FB-LTS maintains exact conservation of mass and absolute vorticity when applied to the TRiSK spatial discretization (Ringler et al., 2010), and provide numerical experiments showing that it retains the temporal order of the scheme on which it is based (second order). In terms of computational performance, we show that when applied to a real-world test case on a highly-variable resolution mesh, the MPAS-Ocean implementation of FB-LTS is up to 10 times faster than the classical four-stage, fourth-order Runge-Kutta method (RK4), and 2.3 times faster than an existing strong stability preserving Runge-Kutta based LTS scheme (LTS3). Despite this significant increase in efficiency, the solutions produced by FB-LTS are qualitatively equivalent to those produced by both RK4 and LTS3.
- [59] arXiv:2405.10506 [pdf, ps, other]
-
Title: Lock-Free Augmented TreesComments: 35 pages, 12 figuresSubjects: Data Structures and Algorithms (cs.DS); Distributed, Parallel, and Cluster Computing (cs.DC)
Augmenting an existing sequential data structure with extra information to support greater functionality is a widely used technique. For example, search trees are augmented to build sequential data structures like order-statistic trees, interval trees, tango trees, link/cut trees and many others.
We study how to design concurrent augmented tree data structures. We present a new, general technique that can augment a lock-free tree to add any new fields to each tree node, provided the new fields' values can be computed from information in the node and its children. This enables the design of lock-free, linearizable analogues of a wide variety of classical augmented data structures. As a first example, we give a wait-free trie that stores a set $S$ of elements drawn from $\{1,\ldots,N\}$ and supports linearizable order-statistic queries such as finding the $k$th smallest element of $S$. Updates and queries take $O(\log N)$ steps. We also apply our technique to a lock-free binary search tree (BST), where changes to the structure of the tree make the linearization argument more challenging. Our augmented BST supports order statistic queries in $O(h)$ steps on a tree of height $h$. The augmentation does not affect the asymptotic running time of the updates.
For both our trie and BST, we give an alternative augmentation to improve searches and order-statistic queries to run in $O(\log |S|)$ steps (with a small increase in step complexity of updates). As an added bonus, our technique supports arbitrary multi-point queries (such as range queries) with the same time complexity as they would have in the corresponding sequential data structure. - [60] arXiv:2405.10508 [pdf, ps, html, other]
-
Title: ART3D: 3D Gaussian Splatting for Text-Guided Artistic Scenes GenerationComments: Accepted at CVPR 2024 Workshop on AI3DGSubjects: Computer Vision and Pattern Recognition (cs.CV)
In this paper, we explore the existing challenges in 3D artistic scene generation by introducing ART3D, a novel framework that combines diffusion models and 3D Gaussian splatting techniques. Our method effectively bridges the gap between artistic and realistic images through an innovative image semantic transfer algorithm. By leveraging depth information and an initial artistic image, we generate a point cloud map, addressing domain differences. Additionally, we propose a depth consistency module to enhance 3D scene consistency. Finally, the 3D scene serves as initial points for optimizing Gaussian splats. Experimental results demonstrate ART3D's superior performance in both content and structural consistency metrics when compared to existing methods. ART3D significantly advances the field of AI in art creation by providing an innovative solution for generating high-quality 3D artistic scenes.
- [61] arXiv:2405.10511 [pdf, ps, other]
-
Title: Defect Category Prediction Based on Multi-Source Domain AdaptationComments: 17 pages, in Chinese language, 8 figures (Due to length constraints of the abstract field, please refer to the original PDF file for the full content of abstract.)Journal-ref: Journal of Software [2024]Subjects: Software Engineering (cs.SE)
In recent years, defect prediction techniques based on deep learning have become a prominent research topic in the field of software engineering. These techniques can identify potential defects without executing the code. However, existing approaches mostly concentrate on determining the presence of defects at the method-level code, lacking the ability to precisely classify specific defect categories. Consequently, this undermines the efficiency of developers in locating and rectifying defects. Furthermore, in practical software development, new projects often lack sufficient defect data to train high-accuracy deep learning models. Models trained on historical data from existing projects frequently struggle to achieve satisfactory generalization performance on new projects. Hence, this paper initially reformulates the traditional binary defect prediction task into a multi-label classification problem, employing defect categories described in the Common Weakness Enumeration (CWE) as fine-grained predictive labels. To enhance the model performance in cross-project scenarios, this paper proposes a multi-source domain adaptation framework that integrates adversarial training and attention mechanisms. Specifically, the proposed framework employs adversarial training to mitigate domain (i.e., software projects) discrepancies, and further utilizes domain-invariant features to capture feature correlations between each source domain and the target domain. Simultaneously, the proposed framework employs a weighted maximum mean discrepancy as an attention mechanism to minimize the representation distance between source and target domain features, facilitating model in learning more domain-independent features. The experiments on 8 real-world open-source projects show that the proposed approach achieves significant performance improvements compared to state-of-the-art baselines.
- [62] arXiv:2405.10512 [pdf, ps, html, other]
-
Title: In-context Contrastive Learning for Event Causality IdentificationSubjects: Information Retrieval (cs.IR); Machine Learning (cs.LG)
Event Causality Identification (ECI) aims at determining the existence of a causal relation between two events. Although recent prompt learning-based approaches have shown promising improvements on the ECI task, their performance are often subject to the delicate design of multiple prompts and the positive correlations between the main task and derivate tasks. The in-context learning paradigm provides explicit guidance for label prediction in the prompt learning paradigm, alleviating its reliance on complex prompts and derivative tasks. However, it does not distinguish between positive and negative demonstrations for analogy learning. Motivated from such considerations, this paper proposes an In-Context Contrastive Learning (ICCL) model that utilizes contrastive learning to enhance the effectiveness of both positive and negative demonstrations. Additionally, we apply contrastive learning to event pairs to better facilitate event causality identification. Our ICCL is evaluated on the widely used corpora, including the EventStoryLine and Causal-TimeBank, and results show significant performance improvements over the state-of-the-art algorithms.
- [63] arXiv:2405.10513 [pdf, ps, html, other]
-
Title: Federated Learning With Energy Harvesting Devices: An MDP FrameworkSubjects: Machine Learning (cs.LG); Signal Processing (eess.SP)
Federated learning (FL) requires edge devices to perform local training and exchange information with a parameter server, leading to substantial energy consumption. A critical challenge in practical FL systems is the rapid energy depletion of battery-limited edge devices, which curtails their operational lifespan and affects the learning performance. To address this issue, we apply energy harvesting technique in FL systems to extract ambient energy for continuously powering edge devices. We first establish the convergence bound for the wireless FL system with energy harvesting devices, illustrating that the convergence is impacted by partial device participation and packet drops, both of which depend on the energy supply. To accelerate the convergence, we formulate a joint device scheduling and power control problem and model it as a Markov decision process (MDP). By solving this MDP, we derive the optimal transmission policy and demonstrate that it possesses a monotone structure with respect to the battery and channel states. To overcome the curse of dimensionality caused by the exponential complexity of computing the optimal policy, we propose a low-complexity algorithm, which is asymptotically optimal as the number of devices increases. Furthermore, for unknown channels and harvested energy statistics, we develop a structure-enhanced deep reinforcement learning algorithm that leverages the monotone structure of the optimal policy to improve the training performance. Finally, extensive numerical experiments on real-world datasets are presented to validate the theoretical results and corroborate the effectiveness of the proposed algorithms.
- [64] arXiv:2405.10514 [pdf, ps, html, other]
-
Title: Secrecy Performance Analysis of Multi-Functional RIS-Assisted NOMA NetworksComments: 14 pages, 9 figures, submitted to IEEE transactions on wireless communicationSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Although reconfigurable intelligent surface (RIS) can improve the secrecy communication performance of wireless users, it still faces challenges such as limited coverage and double-fading effect. To address these issues, in this paper, we utilize a novel multi-functional RIS (MF-RIS) to enhance the secrecy performance of wireless users, and investigate the physical layer secrecy problem in non-orthogonal multiple access (NOMA) networks. Specifically, we derive closed-form expressions for the secrecy outage probability (SOP) and secrecy throughput of users in the MF-RIS-assisted NOMA networks with external and internal eavesdroppers. The asymptotic expressions for SOP and secrecy diversity order are also analyzed under high signal-to-noise ratio (SNR) conditions. Additionally, we examine the impact of receiver hardware limitations and error transmission-induced imperfect successive interference cancellation (SIC) on the secrecy performance. Numerical results indicate that: i) under the same power budget, the secrecy performance achieved by MF-RIS significantly outperforms active RIS and simultaneously transmitting and reflecting RIS; ii) with increasing power budget, residual interference caused by imperfect SIC surpasses thermal noise as the primary factor affecting secrecy capacity; and iii) deploying additional elements at the MF-RIS brings significant secrecy enhancements for the external eavesdropping scenario, in contrast to the internal eavesdropping case.
- [65] arXiv:2405.10515 [pdf, ps, other]
-
Title: Improved AdaBoost for Virtual Reality Experience Prediction Based on Long Short-Term Memory NetworkSubjects: Machine Learning (cs.LG)
A classification prediction algorithm based on Long Short-Term Memory Network (LSTM) improved AdaBoost is used to predict virtual reality (VR) user experience. The dataset is randomly divided into training and test sets in the ratio of 7:3.During the training process, the model's loss value decreases from 0.65 to 0.31, which shows that the model gradually reduces the discrepancy between the prediction results and the actual labels, and improves the accuracy and generalisation ability.The final loss value of 0.31 indicates that the model fits the training data well, and is able to make predictions and classifications more accurately. The confusion matrix for the training set shows a total of 177 correct predictions and 52 incorrect predictions, with an accuracy of 77%, precision of 88%, recall of 77% and f1 score of 82%. The confusion matrix for the test set shows a total of 167 correct and 53 incorrect predictions with 75% accuracy, 87% precision, 57% recall and 69% f1 score. In summary, the classification prediction algorithm based on LSTM with improved AdaBoost shows good prediction ability for virtual reality user experience. This study is of great significance to enhance the application of virtual reality technology in user experience. By combining LSTM and AdaBoost algorithms, significant progress has been made in user experience prediction, which not only improves the accuracy and generalisation ability of the model, but also provides useful insights for related research in the field of virtual reality. This approach can help developers better understand user requirements, optimise virtual reality product design, and enhance user satisfaction, promoting the wide application of virtual reality technology in various fields.
- [66] arXiv:2405.10516 [pdf, ps, other]
-
Title: Language Models can Evaluate Themselves via Probability DiscrepancyComments: ACL 2024 FindingsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In this paper, we initiate our discussion by demonstrating how Large Language Models (LLMs), when tasked with responding to queries, display a more even probability distribution in their answers if they are more adept, as opposed to their less skilled counterparts. Expanding on this foundational insight, we propose a new self-evaluation method ProbDiff for assessing the efficacy of various LLMs. This approach obviates the necessity for an additional evaluation model or the dependence on external, proprietary models like GPT-4 for judgment. It uniquely utilizes the LLMs being tested to compute the probability discrepancy between the initial response and its revised versions. A higher discrepancy for a given query between two LLMs indicates a relatively weaker capability. Our findings reveal that ProbDiff achieves results on par with those obtained from evaluations based on GPT-4, spanning a range of scenarios that include natural language generation (NLG) tasks such as translation, summarization, and our proposed Xiaohongshu blog writing task, and benchmarks for LLM evaluation like AlignBench, MT-Bench, and AlpacaEval, across LLMs of varying magnitudes.
- [67] arXiv:2405.10517 [pdf, ps, html, other]
-
Title: Towards Better Question Generation in QA-Based Event ExtractionComments: Accepted to ACL2024Subjects: Computation and Language (cs.CL)
Event Extraction (EE) is an essential information extraction task that aims to extract event-related information from unstructured texts. The paradigm of this task has shifted from conventional classification-based methods to more contemporary question-answering (QA)-based approaches. However, in QA-based EE, the questions' quality dramatically affects the extraction accuracy, and how to generate high-quality questions for QA-based EE still remains a challenge. In this work, to tackle this challenge, we suggest four criteria to evaluate the quality of a question and propose a reinforcement learning method for QA-Based EE that can generate fluent, generalizable, and context-dependent questions and provides clear guidance to QA models. The extensive experiments conducted on ACE and RAMS datasets have strongly validated our approach's effectiveness, which also demonstrates its robustness in scenarios with limited training data.
- [68] arXiv:2405.10518 [pdf, ps, html, other]
-
Title: Enhancing Perception Quality in Remote Sensing Image Compression via Invertible Neural NetworkSubjects: Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Decoding remote sensing images to achieve high perceptual quality, particularly at low bitrates, remains a significant challenge. To address this problem, we propose the invertible neural network-based remote sensing image compression (INN-RSIC) method. Specifically, we capture compression distortion from an existing image compression algorithm and encode it as a set of Gaussian-distributed latent variables via INN. This ensures that the compression distortion in the decoded image becomes independent of the ground truth. Therefore, by leveraging the inverse mapping of INN, we can input the decoded image along with a set of randomly resampled Gaussian distributed variables into the inverse network, effectively generating enhanced images with better perception quality. To effectively learn compression distortion, channel expansion, Haar transformation, and invertible blocks are employed to construct the INN. Additionally, we introduce a quantization module (QM) to mitigate the impact of format conversion, thus enhancing the framework's generalization and improving the perceptual quality of enhanced images. Extensive experiments demonstrate that our INN-RSIC significantly outperforms the existing state-of-the-art traditional and deep learning-based image compression methods in terms of perception quality.
- [69] arXiv:2405.10521 [pdf, ps, html, other]
-
Title: Generative AI for Secure and Privacy-Preserving Mobile CrowdsensingSubjects: Cryptography and Security (cs.CR)
Recently, generative AI has attracted much attention from both academic and industrial fields, which has shown its potential, especially in the data generation and synthesis aspects. Simultaneously, secure and privacy-preserving mobile crowdsensing (SPPMCS) has been widely applied in data collection/ acquirement due to an advantage on low deployment cost, flexible implementation, and high adaptability. Since generative AI can generate new synthetic data to replace the original data to be analyzed and processed, it can lower data attacks and privacy leakage risks for the original data. Therefore, integrating generative AI into SPPMCS is feasible and significant. Moreover, this paper investigates an integration of generative AI in SPPMCS, where we present potential research focuses, solutions, and case studies. Specifically, we firstly review the preliminaries for generative AI and SPPMCS, where their integration potential is presented. Then, we discuss research issues and solutions for generative AI-enabled SPPMCS, including security defense of malicious data injection, illegal authorization, malicious spectrum manipulation at the physical layer, and privacy protection on sensing data content, sensing terminals' identification and location. Next, we propose a framework for sensing data content protection with generative AI, and simulations results have clearly demonstrated the effectiveness of the proposed framework. Finally, we present major research directions for generative AI-enabled SPPMCS.
- [70] arXiv:2405.10523 [pdf, ps, html, other]
-
Title: Smart Expert System: Large Language Models as Text ClassifiersComments: 11 pages, 3 figures, and 8 tablesSubjects: Computation and Language (cs.CL)
Text classification is a fundamental task in Natural Language Processing (NLP), and the advent of Large Language Models (LLMs) has revolutionized the field. This paper introduces the Smart Expert System, a novel approach that leverages LLMs as text classifiers. The system simplifies the traditional text classification workflow, eliminating the need for extensive preprocessing and domain expertise. The performance of several LLMs, machine learning (ML) algorithms, and neural network (NN) based structures is evaluated on four datasets. Results demonstrate that certain LLMs surpass traditional methods in sentiment analysis, spam SMS detection and multi-label classification. Furthermore, it is shown that the system's performance can be further enhanced through few-shot or fine-tuning strategies, making the fine-tuned model the top performer across all datasets. Source code and datasets are available in this GitHub repository: this https URL.
- [71] arXiv:2405.10526 [pdf, ps, other]
-
Title: Guidelines for evaluation of complex multi agent test scenariosSubjects: Robotics (cs.RO)
To support the testing of AVs, CETRAN has created a guideline for the evaluation of complex multi agent test scenarios presented in this report. This allows for a clear structured manner in evaluating complexity elements based on the corresponding difficulties an AV might encounter in Singapore traffic. This study aims to understand the source of complexity for AVs from traffic hazard, by breaking down the difficulties on AV capabilities as perception, situation awareness and decision-making. Guidelines created through this study are composed by a list of elements to be considered in the future as selection criteria to evaluate complexity of scenarios to support AV behaviour assessment. This study is intended to be a guide to understand the sources of complexity for Avs and can be used to challenge the risk management ability of autonomous vehicles in a scenario-based test approach or traffic situations faced on road trials.
The report includes the usage of the guidelines created as application to evaluate the complexity of a set of 5 real events that occur on Singapore roads from Resembler webtool which is a database of real human accidents/incidents. Four scenarios were also designed for creation in simulation by the CETRAN team, applying the guidelines for complexity elements created in this work, to illustrate the difficulties an ADS could experience with such scenarios. - [72] arXiv:2405.10529 [pdf, ps, html, other]
-
Title: Safeguarding Vision-Language Models Against Patched Visual Prompt InjectorsComments: 15 pagesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Large language models have become increasingly prominent, also signaling a shift towards multimodality as the next frontier in artificial intelligence, where their embeddings are harnessed as prompts to generate textual content. Vision-language models (VLMs) stand at the forefront of this advancement, offering innovative ways to combine visual and textual data for enhanced understanding and interaction. However, this integration also enlarges the attack surface. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications, as demonstrated in many existing literature. In this paper, we propose to address patched visual prompt injection, where adversaries exploit adversarial patches to generate target content in VLMs. Our investigation reveals that patched adversarial prompts exhibit sensitivity to pixel-wise randomization, a trait that remains robust even against adaptive attacks designed to counteract such defenses. Leveraging this insight, we introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, specifically tailored to protect VLMs from the threat of patched visual prompt injectors. Our framework significantly lowers the attack success rate to a range between 0% and 5.0% on two leading VLMs, while achieving around 67.3% to 95.0% context recovery of the benign images, demonstrating a balance between security and usability.
- [73] arXiv:2405.10530 [pdf, ps, html, other]
-
Title: CM-UNet: Hybrid CNN-Mamba UNet for Remote Sensing Image Semantic SegmentationComments: 5 pages, 6 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Due to the large-scale image size and object variations, current CNN-based and Transformer-based approaches for remote sensing image semantic segmentation are suboptimal for capturing the long-range dependency or limited to the complex computational complexity. In this paper, we propose CM-UNet, comprising a CNN-based encoder for extracting local image features and a Mamba-based decoder for aggregating and integrating global information, facilitating efficient semantic segmentation of remote sensing images. Specifically, a CSMamba block is introduced to build the core segmentation decoder, which employs channel and spatial attention as the gate activation condition of the vanilla Mamba to enhance the feature interaction and global-local information fusion. Moreover, to further refine the output features from the CNN encoder, a Multi-Scale Attention Aggregation (MSAA) module is employed to merge the different scale features. By integrating the CSMamba block and MSAA module, CM-UNet effectively captures the long-range dependencies and multi-scale global contextual information of large-scale remote-sensing images. Experimental results obtained on three benchmarks indicate that the proposed CM-UNet outperforms existing methods in various performance metrics. The codes are available at this https URL.
- [74] arXiv:2405.10531 [pdf, ps, html, other]
-
Title: Nonparametric Teaching of Implicit Neural RepresentationsComments: ICML 2024 (24 pages, 13 figures)Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of INRs, we propose a paradigm called Implicit Neural Teaching (INT) that treats INR learning as a nonparametric teaching problem, where the given signal being fitted serves as the target function. The teacher then selects signal fragments for iterative training of the MLP to achieve fast convergence. By establishing a connection between MLP evolution through parameter-based gradient descent and that of function evolution through functional gradient descent in nonparametric teaching, we show for the first time that teaching an overparameterized MLP is consistent with teaching a nonparametric learner. This new discovery readily permits a convenient drop-in of nonparametric teaching algorithms to broadly enhance INR training efficiency, demonstrating 30%+ training time savings across various input modalities.
- [75] arXiv:2405.10534 [pdf, ps, html, other]
-
Title: CMA-ES for Safe OptimizationComments: This paper has been accepted as a full paper at GECCO2024Subjects: Neural and Evolutionary Computing (cs.NE)
In several real-world applications in medical and control engineering, there are unsafe solutions whose evaluations involve inherent risk. This optimization setting is known as safe optimization and formulated as a specialized type of constrained optimization problem with constraints for safety functions. Safe optimization requires performing efficient optimization without evaluating unsafe solutions. A few studies have proposed the optimization methods for safe optimization based on Bayesian optimization and the evolutionary algorithm. However, Bayesian optimization-based methods often struggle to achieve superior solutions, and the evolutionary algorithm-based method fails to effectively reduce unsafe evaluations. This study focuses on CMA-ES as an efficient evolutionary algorithm and proposes an optimization method termed safe CMA-ES. The safe CMA-ES is designed to achieve both safety and efficiency in safe optimization. The safe CMA-ES estimates the Lipschitz constants of safety functions transformed with the distribution parameters using the maximum norm of the gradient in Gaussian process regression. Subsequently, the safe CMA-ES projects the samples to the nearest point in the safe region constructed with the estimated Lipschitz constants. The numerical simulation using the benchmark functions shows that the safe CMA-ES successfully performs optimization, suppressing the unsafe evaluations, while the existing methods struggle to significantly reduce the unsafe evaluations.
- [76] arXiv:2405.10536 [pdf, ps, html, other]
-
Title: Time-Varying Constraint-Aware Reinforcement Learning for Energy Storage ControlComments: ICLR 2024 Workshop: Tackling Climate Change with Machine LearningSubjects: Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Energy storage devices, such as batteries, thermal energy storages, and hydrogen systems, can help mitigate climate change by ensuring a more stable and sustainable power supply. To maximize the effectiveness of such energy storage, determining the appropriate charging and discharging amounts for each time period is crucial. Reinforcement learning is preferred over traditional optimization for the control of energy storage due to its ability to adapt to dynamic and complex environments. However, the continuous nature of charging and discharging levels in energy storage poses limitations for discrete reinforcement learning, and time-varying feasible charge-discharge range based on state of charge (SoC) variability also limits the conventional continuous reinforcement learning. In this paper, we propose a continuous reinforcement learning approach that takes into account the time-varying feasible charge-discharge range. An additional objective function was introduced for learning the feasible action range for each time period, supplementing the objectives of training the actor for policy learning and the critic for value learning. This actively promotes the utilization of energy storage by preventing them from getting stuck in suboptimal states, such as continuous full charging or discharging. This is achieved through the enforcement of the charging and discharging levels into the feasible action range. The experimental results demonstrated that the proposed method further maximized the effectiveness of energy storage by actively enhancing its utilization.
- [77] arXiv:2405.10542 [pdf, ps, html, other]
-
Title: Benchmarking Large Language Models on CFLUE -- A Chinese Financial Language Understanding Evaluation DatasetComments: Accepted by ACL 2024Journal-ref: The 62nd Annual Meeting of the Association for Computational Linguistics(ACL),2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In light of recent breakthroughs in large language models (LLMs) that have revolutionized natural language processing (NLP), there is an urgent need for new benchmarks to keep pace with the fast development of LLMs. In this paper, we propose CFLUE, the Chinese Financial Language Understanding Evaluation benchmark, designed to assess the capability of LLMs across various dimensions. Specifically, CFLUE provides datasets tailored for both knowledge assessment and application assessment. In knowledge assessment, it consists of 38K+ multiple-choice questions with associated solution explanations. These questions serve dual purposes: answer prediction and question reasoning. In application assessment, CFLUE features 16K+ test instances across distinct groups of NLP tasks such as text classification, machine translation, relation extraction, reading comprehension, and text generation. Upon CFLUE, we conduct a thorough evaluation of representative LLMs. The results reveal that only GPT-4 and GPT-4-turbo achieve an accuracy exceeding 60\% in answer prediction for knowledge assessment, suggesting that there is still substantial room for improvement in current LLMs. In application assessment, although GPT-4 and GPT-4-turbo are the top two performers, their considerable advantage over lightweight LLMs is noticeably diminished. The datasets and scripts associated with CFLUE are openly accessible at this https URL.
- [78] arXiv:2405.10543 [pdf, ps, html, other]
-
Title: SMARD: A Cost Effective Smart Agro Development Technology for Crops Disease ClassificationSubjects: Cryptography and Security (cs.CR)
Agriculture has a significant role in a country's economy. The "SMARD" project aims to strengthen the country's agricultural sector by giving farmers with the information and tools they need to solve common difficulties and increase productivity. The project provides farmers with information on crop care, seed selection, and disease management best practices, as well as access to tools for recognizing and treating crop diseases. Farmers can also contact the expert panel through text message, voice call, or video call to purchase fertilizer, seeds, and pesticides at low prices, as well as secure bank loans. The project's goal is to empower farmers and rural communities by providing them with the resources they need to increase crop yields. Additionally, the "SMARD" will not only help farmers and rural communities live better lives, but it will also have a good effect on the economy of the nation. Farmers are now able to recognize plant illnesses more quickly because of the application of machine learning techniques based on image processing categorization. Our experiments' results show that our system "SMARD" outperforms the cutting-edge web applications by attaining 97.3% classification accuracy and 96% F1-score in crop disease classification. Overall, our project is an important endeavor for the nation's agricultural sector because its main goal is to give farmers the information, resources, and tools they need to increase crop yields, improve economic outcomes, and improve livelihoods.
- [79] arXiv:2405.10545 [pdf, ps, html, other]
-
Title: Dynamic Cluster Analysis to Detect and Track Novelty in Network TelescopesSubjects: Networking and Internet Architecture (cs.NI)
In the context of cybersecurity, tracking the activities of coordinated hosts over time is a daunting task because both participants and their behaviours evolve at a fast pace. We address this scenario by solving a dynamic novelty discovery problem with the aim of both re-identifying patterns seen in the past and highlighting new patterns. We focus on traffic collected by Network Telescopes, a primary and noisy source for cybersecurity analysis. We propose a 3-stage pipeline: (i) we learn compact representations (embeddings) of hosts through their traffic in a self-supervised fashion; (ii) via clustering, we distinguish groups of hosts performing similar activities; (iii) we track the cluster temporal evolution to highlight novel patterns. We apply our methodology to 20 days of telescope traffic during which we observe more than 8 thousand active hosts. Our results show that we efficiently identify 50-70 well-shaped clusters per day, 60-70% of which we associate with already analysed cases, while we pinpoint 10-20 previously unseen clusters per day. These correspond to activity changes and new incidents, of which we document some. In short, our novelty discovery methodology enormously simplifies the manual analysis the security analysts have to conduct to gain insights to interpret novel coordinated activities.
- [80] arXiv:2405.10546 [pdf, ps, html, other]
-
Title: You Can't Solve These Super Mario Bros. Levels: Undecidable Mario GamesSubjects: Computational Complexity (cs.CC)
We prove RE-completeness (and thus undecidability) of several 2D games in the Super Mario Bros. platform video game series: the New Super Mario Bros. series (original, Wii, U, and 2), and both Super Mario Maker games in all five game styles (Super Mario Bros. 1 and 3, Super Mario World, New Super Mario Bros. U, and Super Mario 3D World). These results hold even when we restrict to constant-size levels and screens, but they do require generalizing to allow arbitrarily many enemies at each location and onscreen, as well as allowing for exponentially large (or no) timer. Our New Super Mario Bros. constructions fit within one standard screen size. In our Super Mario Maker reductions, we work within the standard screen size and use the property that the game engine remembers offscreen objects that are global because they are supported by "global ground". To prove these Mario results, we build a new theory of counter gadgets in the motion-planning-through-gadgets framework, and provide a suite of simple gadgets for which reachability is RE-complete.
- [81] arXiv:2405.10547 [pdf, ps, html, other]
-
Title: GPTs Window Shopping: An analysis of the Landscape of Custom ChatGPT ModelsComments: 9 pagesSubjects: Social and Information Networks (cs.SI)
OpenAI's ChatGPT initiated a wave of technical iterations in the space of Large Language Models (LLMs) by demonstrating the capability and disruptive power of LLMs. OpenAI has prompted large organizations to respond with their own advancements and models to push the LLM performance envelope. OpenAI has prompted large organizations to respond with their own advancements and models to push the LLM performance envelope. OpenAI's success in spotlighting AI can be partially attributed to decreased barriers to entry, enabling any individual with an internet-enabled device to interact with LLMs. What was previously relegated to a few researchers and developers with necessary computing resources is now available to all. A desire to customize LLMs to better accommodate individual needs prompted OpenAI's creation of the GPT Store, a central platform where users can create and share custom GPT models. Customization comes in the form of prompt-tuning, analysis of reference resources, browsing, and external API interactions, alongside a promise of revenue sharing for created custom GPTs. In this work, we peer into the window of the GPT Store and measure its impact. Our analysis constitutes a large-scale overview of the store exploring community perception, GPT details, and the GPT authors, in addition to a deep-dive into a 3rd party storefront indexing user-submitted GPTs, exploring if creators seek to monetize their creations in the absence of OpenAI's revenue sharing.
- [82] arXiv:2405.10548 [pdf, ps, html, other]
-
Title: Language Models can Exploit Cross-Task In-context Learning for Data-Scarce Novel TasksComments: Accepted at ACL 2024 MainSubjects: Computation and Language (cs.CL)
Large Language Models (LLMs) have transformed NLP with their remarkable In-context Learning (ICL) capabilities. Automated assistants based on LLMs are gaining popularity; however, adapting them to novel tasks is still challenging. While colossal models excel in zero-shot performance, their computational demands limit widespread use, and smaller language models struggle without context. This paper investigates whether LLMs can generalize from labeled examples of predefined tasks to novel tasks. Drawing inspiration from biological neurons and the mechanistic interpretation of the Transformer architecture, we explore the potential for information sharing across tasks. We design a cross-task prompting setup with three LLMs and show that LLMs achieve significant performance improvements despite no examples from the target task in the context. Cross-task prompting leads to a remarkable performance boost of 107% for LLaMA-2 7B, 18.6% for LLaMA-2 13B, and 3.2% for GPT 3.5 on average over zero-shot prompting, and performs comparable to standard in-context learning. The effectiveness of generating pseudo-labels for in-task examples is demonstrated, and our analyses reveal a strong correlation between the effect of cross-task examples and model activation similarities in source and target input tokens. This paper offers a first-of-its-kind exploration of LLMs' ability to solve novel tasks based on contextual signals from different task examples.
- [83] arXiv:2405.10554 [pdf, ps, html, other]
-
Title: NeRO: Neural Road Surface ReconstructionSubjects: Computer Vision and Pattern Recognition (cs.CV)
In computer vision and graphics, the accurate reconstruction of road surfaces is pivotal for various applications, especially in autonomous driving. This paper introduces a novel method leveraging the Multi-Layer Perceptrons (MLPs) framework to reconstruct road surfaces in height, color, and semantic information by input world coordinates x and y. Our approach NeRO uses encoding techniques based on MLPs, significantly improving the performance of the complex details, speeding up the training speed, and reducing neural network size. The effectiveness of this method is demonstrated through its superior performance, which indicates a promising direction for rendering road surfaces with semantics applications, particularly in applications demanding visualization of road conditions, 4D labeling, and semantic groupings.
- [84] arXiv:2405.10556 [pdf, ps, html, other]
-
Title: Parameterized Complexity of Dominating Set Variants in Almost Cluster and Split GraphsComments: Some of the results appeared in proceedings of CSR 2018Subjects: Data Structures and Algorithms (cs.DS)
We consider structural parameterizations of the fundamental Dominating Set problem and its variants in the parameter ecology program. We give improved FPT algorithms and lower bounds under well-known conjectures for dominating set in graphs that are k vertices away from a cluster graph or a split graph. These are graphs in which there is a set of k vertices (called the modulator) whose deletion results in a cluster graph or a split graph. We also call k as the deletion distance (to the appropriate class of graphs). When parameterized by the deletion distance k to cluster graphs - we can find a minimum dominating set (DS) in 3^k n^{O(1)}-time. Within the same time, we can also find a minimum independent dominating set (IDS) or a minimum dominating clique (DC) or a minimum efficient dominating set (EDS) or a minimum total dominating set (TDS). We also show that most of these variants of dominating set do not have polynomial sized kernel. Additionally, we show that when parameterized by the deletion distance k to split graphs - IDS can be solved in 2^k n^{O(1)}-time and EDS can be solved in 3^{k/2}n^{O(1)}.
- [85] arXiv:2405.10557 [pdf, ps, html, other]
-
Title: Resolving Symmetry Ambiguity in Correspondence-based Methods for Instance-level Object Pose EstimationYongliang Lin, Yongzhi Su, Sandeep Inuganti, Yan Di, Naeem Ajilforoushan, Hanqing Yang, Yu Zhang, Jason RambachComments: 8 pages,10 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV)
Estimating the 6D pose of an object from a single RGB image is a critical task that becomes additionally challenging when dealing with symmetric objects. Recent approaches typically establish one-to-one correspondences between image pixels and 3D object surface vertices. However, the utilization of one-to-one correspondences introduces ambiguity for symmetric objects. To address this, we propose SymCode, a symmetry-aware surface encoding that encodes the object surface vertices based on one-to-many correspondences, eliminating the problem of one-to-one correspondence ambiguity. We also introduce SymNet, a fast end-to-end network that directly regresses the 6D pose parameters without solving a PnP problem. We demonstrate faster runtime and comparable accuracy achieved by our method on the T-LESS and IC-BIN benchmarks of mostly symmetric objects. Our source code will be released upon acceptance.
- [86] arXiv:2405.10558 [pdf, ps, html, other]
-
Title: CACL: Community-Aware Heterogeneous Graph Contrastive Learning for Social Media Bot DetectionComments: Accepted by ACL 2024 findingsSubjects: Social and Information Networks (cs.SI)
Social media bot detection is increasingly crucial with the rise of social media platforms. Existing methods predominantly construct social networks as graph and utilize graph neural networks (GNNs) for bot detection. However, most of these methods focus on how to improve the performance of GNNs while neglecting the community structure within social networks. Moreover, GNNs based methods still face problems such as poor model generalization due to the relatively small scale of the dataset and over-smoothness caused by information propagation mechanism. To address these problems, we propose a Community-Aware Heterogeneous Graph Contrastive Learning framework (CACL), which constructs social network as heterogeneous graph with multiple node types and edge types, and then utilizes community-aware module to dynamically mine both hard positive samples and hard negative samples for supervised graph contrastive learning with adaptive graph enhancement algorithms. Extensive experiments demonstrate that our framework addresses the previously mentioned challenges and outperforms competitive baselines on three social media bot benchmarks.
- [87] arXiv:2405.10563 [pdf, ps, html, other]
-
Title: Function Extrapolation with Neural Networks and Its Application for ManifoldsComments: 32 pages, 11 figuresSubjects: Machine Learning (cs.LG); Numerical Analysis (math.NA)
This paper addresses the problem of accurately estimating a function on one domain when only its discrete samples are available on another domain. To answer this challenge, we utilize a neural network, which we train to incorporate prior knowledge of the function. In addition, by carefully analyzing the problem, we obtain a bound on the error over the extrapolation domain and define a condition number for this problem that quantifies the level of difficulty of the setup. Compared to other machine learning methods that provide time series prediction, such as transformers, our approach is suitable for setups where the interpolation and extrapolation regions are general subdomains and, in particular, manifolds. In addition, our construction leads to an improved loss function that helps us boost the accuracy and robustness of our neural network. We conduct comprehensive numerical tests and comparisons of our extrapolation versus standard methods. The results illustrate the effectiveness of our approach in various scenarios.
- [88] arXiv:2405.10565 [pdf, ps, other]
-
Title: Real-time Level-of-Detail Strand-based Hair RenderingComments: 12 pages, 10 figures, 1 performance plotSubjects: Graphics (cs.GR)
Strand-based hair rendering has become increasingly popular in production for its realistic appearance. However, the prevailing level-of-detail solution employing hair cards for distant hair models introduces a significant discontinuity in dynamics and appearance during the transition from strands to cards. We introduce an innovative real-time framework for strand-based hair rendering that ensures seamless transitions between different levels of detail (LOD) while maintaining a consistent hair appearance. Our method uses elliptical thick hairs that contain multiple hair strands at each LOD to maintain the shapes of hair clusters. In addition to geometric fitting, we formulate an elliptical Bidirectional Curve Scattering Distribution Functions (BCSDF) model for a thick hair, accurately capturing single scattering and multiple scattering within the hair cluster, accommodating a spectrum from sparse to dense hair distributions. Our framework, tested on various hairstyles with dynamics as well as knits, shows that it can produce highly similar appearances to full hair geometries at different viewing distances with seamless LOD transitions, while achieving up to a 3x speedup.
- [89] arXiv:2405.10567 [pdf, ps, html, other]
-
Title: Team Samsung-RAL: Technical Report for 2024 RoboDrive Challenge-Robust Map Segmentation TrackComments: ICRA 2024 RoboDrive Challenge Robust Map Segmentation Track 3rd Place Technical Report. arXiv admin note: text overlap with arXiv:2205.09743 by other authorsSubjects: Computer Vision and Pattern Recognition (cs.CV)
In this report, we describe the technical details of our submission to the 2024 RoboDrive Challenge Robust Map Segmentation Track. The Robust Map Segmentation track focuses on the segmentation of complex driving scene elements in BEV maps under varied driving conditions. Semantic map segmentation provides abundant and precise static environmental information crucial for autonomous driving systems' planning and navigation. While current methods excel in ideal circumstances, e.g., clear daytime conditions and fully functional sensors, their resilience to real-world challenges like adverse weather and sensor failures remains unclear, raising concerns about system safety. In this paper, we explored several methods to improve the robustness of the map segmentation task. The details are as follows: 1) Robustness analysis of utilizing temporal information; 2) Robustness analysis of utilizing different backbones; and 3) Data Augmentation to boost corruption robustness. Based on the evaluation results, we draw several important findings including 1) The temporal fusion module is effective in improving the robustness of the map segmentation model; 2) A strong backbone is effective for improving the corruption robustness; and 3) Some data augmentation methods are effective in improving the robustness of map segmentation models. These novel findings allowed us to achieve promising results in the 2024 RoboDrive Challenge-Robust Map Segmentation Track.
- [90] arXiv:2405.10572 [pdf, ps, html, other]
-
Title: Resonances as a computational toolSubjects: Numerical Analysis (math.NA)
A large toolbox of numerical schemes for dispersive equations has been established, based on different discretization techniques such as discretizing the variation-of-constants formula (e.g., exponential integrators) or splitting the full equation into a series of simpler subproblems (e.g., splitting methods). In many situations these classical schemes allow a precise and efficient approximation. This, however, drastically changes whenever non-smooth phenomena enter the scene such as for problems at low regularity and high oscillations. Classical schemes fail to capture the oscillatory nature of the solution, and this may lead to severe instabilities and loss of convergence. In this article we review a new class of resonance-based schemes. The key idea in the construction of the new schemes is to tackle and deeply embed the underlying nonlinear structure of resonances into the numerical discretization. As in the continuous case, these terms are central to structure preservation and offer the new schemes strong properties at low regularity.
- [91] arXiv:2405.10575 [pdf, ps, other]
-
Title: Accurate Training Data for Occupancy Map Prediction in Automated Driving Using Evidence TheorySubjects: Computer Vision and Pattern Recognition (cs.CV)
Automated driving fundamentally requires knowledge about the surrounding geometry of the scene. Modern approaches use only captured images to predict occupancy maps that represent the geometry. Training these approaches requires accurate data that may be acquired with the help of LiDAR scanners. We show that the techniques used for current benchmarks and training datasets to convert LiDAR scans into occupancy grid maps yield very low quality, and subsequently present a novel approach using evidence theory that yields more accurate reconstructions. We demonstrate that these are superior by a large margin, both qualitatively and quantitatively, and that we additionally obtain meaningful uncertainty estimates. When converting the occupancy maps back to depth estimates and comparing them with the raw LiDAR measurements, our method yields a MAE improvement of 30% to 52% on nuScenes and 53% on Waymo over other occupancy ground-truth data. Finally, we use the improved occupancy maps to train a state-of-the-art occupancy prediction method and demonstrate that it improves the MAE by 25% on nuScenes.
- [92] arXiv:2405.10576 [pdf, ps, html, other]
-
Title: An Efficient Learning Control Framework With Sim-to-Real for String-Type Artificial Muscle-Driven Robotic SystemsSubjects: Robotics (cs.RO)
Robotic systems driven by artificial muscles present unique challenges due to the nonlinear dynamics of actuators and the complex designs of mechanical structures. Traditional model-based controllers often struggle to achieve desired control performance in such systems. Deep reinforcement learning (DRL), a trending machine learning technique widely adopted in robot control, offers a promising alternative. However, integrating DRL into these robotic systems faces significant challenges, including the requirement for large amounts of training data and the inevitable sim-to-real gap when deployed to real-world robots. This paper proposes an efficient reinforcement learning control framework with sim-to-real transfer to address these challenges. Bootstrap and augmentation enhancements are designed to improve the data efficiency of baseline DRL algorithms, while a sim-to-real transfer technique, namely randomization of muscle dynamics, is adopted to bridge the gap between simulation and real-world deployment. Extensive experiments and ablation studies are conducted utilizing two string-type artificial muscle-driven robotic systems including a two degree-of-freedom robotic eye and a parallel robotic wrist, the results of which demonstrate the effectiveness of the proposed learning control strategy.
- [93] arXiv:2405.10577 [pdf, ps, html, other]
-
Title: DuoSpaceNet: Leveraging Both Bird's-Eye-View and Perspective View Representations for 3D Object DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV); Robotics (cs.RO)
Recent advances in multi-view camera-only 3D object detection either rely on an accurate reconstruction of bird's-eye-view (BEV) 3D features or on traditional 2D perspective view (PV) image features. While both have their own pros and cons, few have found a way to stitch them together in order to benefit from "the best of both worlds". To this end, we explore a duo space (i.e., BEV and PV) 3D perception framework, in conjunction with some useful duo space fusion strategies that allow effective aggregation of the two feature representations. To the best of our knowledge, our proposed method, DuoSpaceNet, is the first to leverage two distinct feature spaces and achieves the state-of-the-art 3D object detection and BEV map segmentation results on nuScenes dataset.
- [94] arXiv:2405.10578 [pdf, ps, html, other]
-
Title: Jacobi Stability Analysis for Systems of ODEs Using Symbolic ComputationComments: To appear in Proceedings of the 2024 International Symposium on Symbolic and Algebraic Computation (ISSAC 2024)Subjects: Symbolic Computation (cs.SC)
The classical theory of Kosambi-Cartan-Chern (KCC) developed in differential geometry provides a powerful method for analyzing the behaviors of dynamical systems. In the KCC theory, the properties of a dynamical system are described in terms of five geometrical invariants, of which the second corresponds to the so-called Jacobi stability of the system. Different from that of the Lyapunov stability that has been studied extensively in the literature, the analysis of the Jacobi stability has been investigated more recently using geometrical concepts and tools. It turns out that the existing work on the Jacobi stability analysis remains theoretical and the problem of algorithmic and symbolic treatment of Jacobi stability analysis has yet to be addressed. In this paper, we initiate our study on the problem for a class of ODE systems of arbitrary dimension and propose two algorithmic schemes using symbolic computation to check whether a nonlinear dynamical system may exhibit Jacobi stability. The first scheme, based on the construction of the complex root structure of a characteristic polynomial and on the method of quantifier elimination, is capable of detecting the existence of the Jacobi stability of the given dynamical system. The second algorithmic scheme exploits the method of semi-algebraic system solving and allows one to determine conditions on the parameters for a given dynamical system to have a prescribed number of Jacobi stable fixed points. Several examples are presented to demonstrate the effectiveness of the proposed algorithmic schemes.
- [95] arXiv:2405.10579 [pdf, ps, html, other]
-
Title: A Hard Nut to Crack: Idiom Detection with Conversational Large Language ModelsSubjects: Computation and Language (cs.CL)
In this work, we explore idiomatic language processing with Large Language Models (LLMs). We introduce the Idiomatic language Test Suite IdioTS, a new dataset of difficult examples specifically designed by language experts to assess the capabilities of LLMs to process figurative language at sentence level. We propose a comprehensive evaluation methodology based on an idiom detection task, where LLMs are prompted with detecting an idiomatic expression in a given English sentence. We present a thorough automatic and manual evaluation of the results and an extensive error analysis.
- [96] arXiv:2405.10581 [pdf, ps, other]
-
Title: Future Aware Safe Active Learning of Time Varying Systems using Gaussian ProcessesSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Optimization and Control (math.OC); Probability (math.PR)
Experimental exploration of high-cost systems with safety constraints, common in engineering applications, is a challenging endeavor. Data-driven models offer a promising solution, but acquiring the requisite data remains expensive and is potentially unsafe. Safe active learning techniques prove essential, enabling the learning of high-quality models with minimal expensive data points and high safety. This paper introduces a safe active learning framework tailored for time-varying systems, addressing drift, seasonal changes, and complexities due to dynamic behavior. The proposed Time-aware Integrated Mean Squared Prediction Error (T-IMSPE) method minimizes posterior variance over current and future states, optimizing information gathering also in the time domain. Empirical results highlight T-IMSPE's advantages in model quality through toy and real-world examples. State of the art Gaussian processes are compatible with T-IMSPE. Our theoretical contributions include a clear delineation which Gaussian process kernels, domains, and weighting measures are suitable for T-IMSPE and even beyond for its non-time aware predecessor IMSPE.
- [97] arXiv:2405.10584 [pdf, ps, other]
-
Title: A Hybrid Deep Learning Framework for Stock Price Prediction Considering the Investor Sentiment of Online Forum Enhanced by PopularitySubjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
Stock price prediction has always been a difficult task for forecasters. Using cutting-edge deep learning techniques, stock price prediction based on investor sentiment extracted from online forums has become feasible. We propose a novel hybrid deep learning framework for predicting stock prices. The framework leverages the XLNET model to analyze the sentiment conveyed in user posts on online forums, combines these sentiments with the post popularity factor to compute daily group sentiments, and integrates this information with stock technical indicators into an improved BiLSTM-highway model for stock price prediction. Through a series of comparative experiments involving four stocks on the Chinese stock market, it is demonstrated that the hybrid framework effectively predicts stock prices. This study reveals the necessity of analyzing investors' textual views for stock price prediction.
- [98] arXiv:2405.10587 [pdf, ps, other]
-
Title: RDRec: Rationale Distillation for LLM-based RecommendationComments: 10 pages. Accepted to ACL 2024 Main as a short paperSubjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Large language model (LLM)-based recommender models that bridge users and items through textual prompts for effective semantic reasoning have gained considerable attention. However, few methods consider the underlying rationales behind interactions, such as user preferences and item attributes, limiting the reasoning capability of LLMs for recommendations. This paper proposes a rationale distillation recommender (RDRec), a compact model designed to learn rationales generated by a larger language model (LM). By leveraging rationales from reviews related to users and items, RDRec remarkably specifies their profiles for recommendations. Experiments show that RDRec achieves state-of-the-art (SOTA) performance in both top-N and sequential recommendations. Our source code is released at this https URL.
- [99] arXiv:2405.10589 [pdf, ps, html, other]
-
Title: Improving Point-based Crowd Counting and Localization Based on Auxiliary Point GuidanceSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Image and Video Processing (eess.IV)
Crowd counting and localization have become increasingly important in computer vision due to their wide-ranging applications. While point-based strategies have been widely used in crowd counting methods, they face a significant challenge, i.e., the lack of an effective learning strategy to guide the matching process. This deficiency leads to instability in matching point proposals to target points, adversely affecting overall performance. To address this issue, we introduce an effective approach to stabilize the proposal-target matching in point-based methods. We propose Auxiliary Point Guidance (APG) to provide clear and effective guidance for proposal selection and optimization, addressing the core issue of matching uncertainty. Additionally, we develop Implicit Feature Interpolation (IFI) to enable adaptive feature extraction in diverse crowd scenarios, further enhancing the model's robustness and accuracy. Extensive experiments demonstrate the effectiveness of our approach, showing significant improvements in crowd counting and localization performance, particularly under challenging conditions. The source codes and trained models will be made publicly available.
- [100] arXiv:2405.10591 [pdf, ps, html, other]
-
Title: GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-SupervisionSubjects: Computer Vision and Pattern Recognition (cs.CV)
3D occupancy perception holds a pivotal role in recent vision-centric autonomous driving systems by converting surround-view images into integrated geometric and semantic representations within dense 3D grids. Nevertheless, current models still encounter two main challenges: modeling depth accurately in the 2D-3D view transformation stage, and overcoming the lack of generalizability issues due to sparse LiDAR supervision. To address these issues, this paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception. Our approach is three-fold: 1) Integration of explicit lift-based depth prediction and implicit projection-based transformers for depth modeling, enhancing the density and robustness of view transformation. 2) Utilization of mask-based encoder-decoder architecture for fine-grained semantic predictions; 3) Adoption of context-aware self-training loss functions in the pertaining stage to complement LiDAR supervision, involving the re-rendering of 2D depth maps from 3D occupancy features and leveraging image reconstruction loss to obtain denser depth supervision besides sparse LiDAR ground-truths. Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone compared with current models, marking an improvement of 3.3% due to our proposed contributions. Comprehensive experimentation also demonstrates the consistent superiority of our method over baselines and alternative approaches.
- [101] arXiv:2405.10596 [pdf, ps, html, other]
-
Title: CELA: Cost-Efficient Language Model Alignment for CTR PredictionXingmei Wang, Weiwen Liu, Xiaolong Chen, Qi Liu, Xu Huang, Defu Lian, Xiangyang Li, Yasheng Wang, Zhenhua Dong, Ruiming TangComments: 10 pages, 5 figuresSubjects: Information Retrieval (cs.IR)
Click-Through Rate (CTR) prediction holds a paramount position in recommender systems. The prevailing ID-based paradigm underperforms in cold-start scenarios due to the skewed distribution of feature frequency. Additionally, the utilization of a single modality fails to exploit the knowledge contained within textual features. Recent efforts have sought to mitigate these challenges by integrating Pre-trained Language Models (PLMs). They design hard prompts to structure raw features into text for each interaction and then apply PLMs for text processing. With external knowledge and reasoning capabilities, PLMs extract valuable information even in cases of sparse interactions. Nevertheless, compared to ID-based models, pure text modeling degrades the efficacy of collaborative filtering, as well as feature scalability and efficiency during both training and inference. To address these issues, we propose \textbf{C}ost-\textbf{E}fficient \textbf{L}anguage Model \textbf{A}lignment (\textbf{CELA}) for CTR prediction. CELA incorporates textual features and language models while preserving the collaborative filtering capabilities of ID-based models. This model-agnostic framework can be equipped with plug-and-play textual features, with item-level alignment enhancing the utilization of external information while maintaining training and inference efficiency. Through extensive offline experiments, CELA demonstrates superior performance compared to state-of-the-art methods. Furthermore, an online A/B test conducted on an industrial App recommender system showcases its practical effectiveness, solidifying the potential for real-world applications of CELA.
- [102] arXiv:2405.10597 [pdf, ps, html, other]
-
Title: UniCL: A Universal Contrastive Learning Framework for Large Time Series ModelsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Time-series analysis plays a pivotal role across a range of critical applications, from finance to healthcare, which involves various tasks, such as forecasting and classification. To handle the inherent complexities of time-series data, such as high dimensionality and noise, traditional supervised learning methods first annotate extensive labels for time-series data in each task, which is very costly and impractical in real-world applications. In contrast, pre-trained foundation models offer a promising alternative by leveraging unlabeled data to capture general time series patterns, which can then be fine-tuned for specific tasks. However, existing approaches to pre-training such models typically suffer from high-bias and low-generality issues due to the use of predefined and rigid augmentation operations and domain-specific data training. To overcome these limitations, this paper introduces UniCL, a universal and scalable contrastive learning framework designed for pretraining time-series foundation models across cross-domain datasets. Specifically, we propose a unified and trainable time-series augmentation operation to generate pattern-preserved, diverse, and low-bias time-series data by leveraging spectral information. Besides, we introduce a scalable augmentation algorithm capable of handling datasets with varying lengths, facilitating cross-domain pretraining. Extensive experiments on two benchmark datasets across eleven domains validate the effectiveness of UniCL, demonstrating its high generalization on time-series analysis across various fields.
- [103] arXiv:2405.10598 [pdf, ps, html, other]
-
Title: Learning Object-Centric Representation via Reverse Hierarchy GuidanceSubjects: Computer Vision and Pattern Recognition (cs.CV)
Object-Centric Learning (OCL) seeks to enable Neural Networks to identify individual objects in visual scenes, which is crucial for interpretable visual comprehension and reasoning. Most existing OCL models adopt auto-encoding structures and learn to decompose visual scenes through specially designed inductive bias, which causes the model to miss small objects during reconstruction. Reverse hierarchy theory proposes that human vision corrects perception errors through a top-down visual pathway that returns to bottom-level neurons and acquires more detailed information, inspired by which we propose Reverse Hierarchy Guided Network (RHGNet) that introduces a top-down pathway that works in different ways in the training and inference processes. This pathway allows for guiding bottom-level features with top-level object representations during training, as well as encompassing information from bottom-level features into perception during inference. Our model achieves SOTA performance on several commonly used datasets including CLEVR, CLEVRTex and MOVi-C. We demonstrate with experiments that our method promotes the discovery of small objects and also generalizes well on complex real-world scenes. Code will be available at https://anonymous.4open.science/r/RHGNet-6CEF.
- [104] arXiv:2405.10608 [pdf, ps, html, other]
-
Title: ECATS: Explainable-by-design concept-based anomaly detection for time seriesComments: 14 pages, 8 figures, submitted to 18th International Conference on Neural-Symbolic Learning and Reasoning (NeSy 2024)Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Deep learning methods for time series have already reached excellent performances in both prediction and classification tasks, including anomaly detection. However, the complexity inherent in Cyber Physical Systems (CPS) creates a challenge when it comes to explainability methods. To overcome this inherent lack of interpretability, we propose ECATS, a concept-based neuro-symbolic architecture where concepts are represented as Signal Temporal Logic (STL) formulae. Leveraging kernel-based methods for STL, concept embeddings are learnt in an unsupervised manner through a cross-attention mechanism. The network makes class predictions through these concept embeddings, allowing for a meaningful explanation to be naturally extracted for each input. Our preliminary experiments with a simple CPS-based dataset show that our model is able to achieve great classification performance while ensuring local interpretability.
- [105] arXiv:2405.10610 [pdf, ps, html, other]
-
Title: Driving Referring Video Object Segmentation with Vision-Language Pre-trained ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV)
The crux of Referring Video Object Segmentation (RVOS) lies in modeling dense text-video relations to associate abstract linguistic concepts with dynamic visual contents at pixel-level. Current RVOS methods typically use vision and language models pre-trained independently as backbones. As images and texts are mapped to uncoupled feature spaces, they face the arduous task of learning Vision-Language~(VL) relation modeling from scratch. Witnessing the success of Vision-Language Pre-trained (VLP) models, we propose to learn relation modeling for RVOS based on their aligned VL feature space. Nevertheless, transferring VLP models to RVOS is a deceptively challenging task due to the substantial gap between the pre-training task (image/region-level prediction) and the RVOS task (pixel-level prediction in videos). In this work, we introduce a framework named VLP-RVOS to address this transfer challenge. We first propose a temporal-aware prompt-tuning method, which not only adapts pre-trained representations for pixel-level prediction but also empowers the vision encoder to model temporal clues. We further propose to perform multi-stage VL relation modeling while and after feature extraction for comprehensive VL understanding. Besides, we customize a cube-frame attention mechanism for spatial-temporal reasoning. Extensive experiments demonstrate that our method outperforms state-of-the-art algorithms and exhibits strong generalization abilities.
- [106] arXiv:2405.10611 [pdf, ps, html, other]
-
Title: A Certified Proof Checker for Deep Neural Network VerificationSubjects: Logic in Computer Science (cs.LO); Artificial Intelligence (cs.AI); Programming Languages (cs.PL)
Recent advances in the verification of deep neural networks (DNNs) have opened the way for broader usage of DNN verification technology in many application areas, including safety-critical ones. DNN verifiers are themselves complex programs that have been shown to be susceptible to errors and imprecisions; this in turn has raised the question of trust in DNN verifiers. One prominent attempt to address this issue is enhancing DNN verifiers with the capability of producing proofs of their results that are subject to independent algorithmic certification (proof checking). Formulations of proof production and proof checking already exist on top of the state-of-the-art Marabou DNN verifier. The native implementation of the proof checking algorithm for Marabou was done in C++ and itself raised the question of trust in the code (e.g., in the precision of floating point calculations or guarantees for implementation soundness). Here, we present an alternative implementation of the Marabou proof checking algorithm in Imandra -- an industrial functional programming language and prover -- that allows us to obtain an implementation with formal guarantees, including proofs of mathematical results underlying the algorithm, such as the use of the Farkas lemma.
- [107] arXiv:2405.10612 [pdf, ps, html, other]
-
Title: Not All Prompts Are Secure: A Switchable Backdoor Attack Against Pre-trained Vision TransformersSubjects: Computer Vision and Pattern Recognition (cs.CV); Cryptography and Security (cs.CR); Machine Learning (cs.LG)
Given the power of vision transformers, a new learning paradigm, pre-training and then prompting, makes it more efficient and effective to address downstream visual recognition tasks. In this paper, we identify a novel security threat towards such a paradigm from the perspective of backdoor attacks. Specifically, an extra prompt token, called the switch token in this work, can turn the backdoor mode on, i.e., converting a benign model into a backdoored one. Once under the backdoor mode, a specific trigger can force the model to predict a target class. It poses a severe risk to the users of cloud API, since the malicious behavior can not be activated and detected under the benign mode, thus making the attack very stealthy. To attack a pre-trained model, our proposed attack, named SWARM, learns a trigger and prompt tokens including a switch token. They are optimized with the clean loss which encourages the model always behaves normally even the trigger presents, and the backdoor loss that ensures the backdoor can be activated by the trigger when the switch is on. Besides, we utilize the cross-mode feature distillation to reduce the effect of the switch token on clean samples. The experiments on diverse visual recognition tasks confirm the success of our switchable backdoor attack, i.e., achieving 95%+ attack success rate, and also being hard to be detected and removed. Our code is available at this https URL.
- [108] arXiv:2405.10616 [pdf, ps, html, other]
-
Title: Feature-based Low-Rank Compression of Large Language Models via Bayesian OptimizationComments: Accepted by 2024 ACL findingsSubjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
In recent years, large language models (LLMs) have driven advances in natural language processing. Still, their growing scale has increased the computational burden, necessitating a balance between efficiency and performance. Low-rank compression, a promising technique, reduces non-essential parameters by decomposing weight matrices into products of two low-rank matrices. Yet, its application in LLMs has not been extensively studied. The key to low-rank compression lies in low-rank factorization and low-rank dimensions allocation. To address the challenges of low-rank compression in LLMs, we conduct empirical research on the low-rank characteristics of large models. We propose a low-rank compression method suitable for LLMs. This approach involves precise estimation of feature distributions through pooled covariance matrices and a Bayesian optimization strategy for allocating low-rank dimensions. Experiments on the LLaMA-2 models demonstrate that our method outperforms existing strong structured pruning and low-rank compression techniques in maintaining model performance at the same compression ratio.
- [109] arXiv:2405.10618 [pdf, ps, other]
-
Title: Distributed Event-Based Learning via ADMMComments: 29 pages, 12 figuresSubjects: Machine Learning (cs.LG); Optimization and Control (math.OC); Machine Learning (stat.ML)
We consider a distributed learning problem, where agents minimize a global objective function by exchanging information over a network. Our approach has two distinct features: (i) It substantially reduces communication by triggering communication only when necessary, and (ii) it is agnostic to the data-distribution among the different agents. We can therefore guarantee convergence even if the local data-distributions of the agents are arbitrarily distinct. We analyze the convergence rate of the algorithm and derive accelerated convergence rates in a convex setting. We also characterize the effect of communication drops and demonstrate that our algorithm is robust to communication failures. The article concludes by presenting numerical results from a distributed LASSO problem, and distributed learning tasks on MNIST and CIFAR-10 datasets. The experiments underline communication savings of 50% or more due to the event-based communication strategy, show resilience towards heterogeneous data-distributions, and highlight that our approach outperforms common baselines such as FedAvg, FedProx, and FedADMM.
- [110] arXiv:2405.10620 [pdf, ps, html, other]
-
Title: MC-GPT: Empowering Vision-and-Language Navigation with Memory Map and Reasoning ChainsSubjects: Artificial Intelligence (cs.AI)
In the Vision-and-Language Navigation (VLN) task, the agent is required to navigate to a destination following a natural language instruction. While learning-based approaches have been a major solution to the task, they suffer from high training costs and lack of interpretability. Recently, Large Language Models (LLMs) have emerged as a promising tool for VLN due to their strong generalization capabilities. However, existing LLM-based methods face limitations in memory construction and diversity of navigation strategies. To address these challenges, we propose a suite of techniques. Firstly, we introduce a method to maintain a topological map that stores navigation history, retaining information about viewpoints, objects, and their spatial relationships. This map also serves as a global action space. Additionally, we present a Navigation Chain of Thoughts module, leveraging human navigation examples to enrich navigation strategy diversity. Finally, we establish a pipeline that integrates navigational memory and strategies with perception and action prediction modules. Experimental results on the REVERIE and R2R datasets show that our method effectively enhances the navigation ability of the LLM and improves the interpretability of navigation reasoning.
- [111] arXiv:2405.10621 [pdf, ps, html, other]
-
Title: Historically Relevant Event Structuring for Temporal Knowledge Graph ReasoningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Temporal Knowledge Graph (TKG) reasoning focuses on predicting events through historical information within snapshots distributed on a timeline. Existing studies mainly concentrate on two perspectives of leveraging the history of TKGs, including capturing evolution of each recent snapshot or correlations among global historical facts. Despite the achieved significant accomplishments, these models still fall short of (1) investigating the influences of multi-granularity interactions across recent snapshots and (2) harnessing the expressive semantics of significant links accorded with queries throughout the entire history, especially events exerting a profound impact on the future. These inadequacies restrict representation ability to reflect historical dependencies and future trends thoroughly. To overcome these drawbacks, we propose an innovative TKG reasoning approach towards \textbf{His}torically \textbf{R}elevant \textbf{E}vents \textbf{S}tructuring ($\mathsf{HisRES}$). Concretely, $\mathsf{HisRES}$ comprises two distinctive modules excelling in structuring historically relevant events within TKGs, including a multi-granularity evolutionary encoder that captures structural and temporal dependencies of the most recent snapshots, and a global relevance encoder that concentrates on crucial correlations among events relevant to queries from the entire history. Furthermore, $\mathsf{HisRES}$ incorporates a self-gating mechanism for adaptively merging multi-granularity recent and historically relevant structuring representations. Extensive experiments on four event-based benchmarks demonstrate the state-of-the-art performance of $\mathsf{HisRES}$ and indicate the superiority and effectiveness of structuring historical relevance for TKG reasoning.
- [112] arXiv:2405.10622 [pdf, ps, html, other]
-
Title: Differentially Private Machine Learning-powered Combinatorial Auction DesignSubjects: Computer Science and Game Theory (cs.GT); Information Theory (cs.IT)
We present a new approach to machine learning-powered combinatorial auctions, which is based on the principles of Differential Privacy. Our methodology guarantees that the auction mechanism is truthful, meaning that rational bidders have the incentive to reveal their true valuation functions. We achieve this by inducing truthfulness in the auction dynamics, ensuring that bidders consistently provide accurate information about their valuation functions.
Our method not only ensures truthfulness but also preserves the efficiency of the original auction. This means that if the initial auction outputs an allocation with high social welfare, our modified truthful version of the auction will also achieve high social welfare. We use techniques from Differential Privacy, such as the Exponential Mechanism, to achieve these results. Additionally, we examine the application of differential privacy in auctions across both asymptotic and non-asymptotic regimes. - [113] arXiv:2405.10623 [pdf, ps, html, other]
-
Title: Model-free fast charging of lithium-ion batteries by online gradient descentSubjects: Systems and Control (eess.SY)
A data-driven solution is provided for the fast-charging problem of lithium-ion batteries with multiple safety and aging constraints. The proposed method optimizes the charging current based on the observed history of measurable battery quantities, such as the input current, terminal voltage, and temperature. The proposed method does not need any detailed battery model or full-charging training episodes. The theoretical convergence is proven under mild conditions and is validated numerically on several linear and nonlinear battery models, including single-particle and equivalent-circuit models.
- [114] arXiv:2405.10624 [pdf, ps, html, other]
-
Title: Sample-Efficient Constrained Reinforcement Learning with General ParameterizationSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We consider a constrained Markov Decision Problem (CMDP) where the goal of an agent is to maximize the expected discounted sum of rewards over an infinite horizon while ensuring that the expected discounted sum of costs exceeds a certain threshold. Building on the idea of momentum-based acceleration, we develop the Primal-Dual Accelerated Natural Policy Gradient (PD-ANPG) algorithm that guarantees an $\epsilon$ global optimality gap and $\epsilon$ constraint violation with $\mathcal{O}(\epsilon^{-3})$ sample complexity. This improves the state-of-the-art sample complexity in CMDP by a factor of $\mathcal{O}(\epsilon^{-1})$.
- [115] arXiv:2405.10625 [pdf, ps, html, other]
-
Title: Specialising and Analysing Instruction-Tuned and Byte-Level Language Models for Organic Reaction PredictionComments: PreprintSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Biomolecules (q-bio.BM)
Transformer-based encoder-decoder models have demonstrated impressive results in chemical reaction prediction tasks. However, these models typically rely on pretraining using tens of millions of unlabelled molecules, which can be time-consuming and GPU-intensive. One of the central questions we aim to answer in this work is: Can FlanT5 and ByT5, the encode-decoder models pretrained solely on language data, be effectively specialised for organic reaction prediction through task-specific fine-tuning? We conduct a systematic empirical study on several key issues of the process, including tokenisation, the impact of (SMILES-oriented) pretraining, fine-tuning sample efficiency, and decoding algorithms at inference. Our key findings indicate that although being pretrained only on language tasks, FlanT5 and ByT5 provide a solid foundation to fine-tune for reaction prediction, and thus become `chemistry domain compatible' in the process. This suggests that GPU-intensive and expensive pretraining on a large dataset of unlabelled molecules may be useful yet not essential to leverage the power of language models for chemistry. All our models achieve comparable Top-1 and Top-5 accuracy although some variation across different models does exist. Notably, tokenisation and vocabulary trimming slightly affect final performance but can speed up training and inference; The most efficient greedy decoding strategy is very competitive while only marginal gains can be achieved from more sophisticated decoding algorithms. In summary, we evaluate FlanT5 and ByT5 across several dimensions and benchmark their impact on organic reaction prediction, which may guide more effective use of these state-of-the-art language models for chemistry-related tasks in the future.
- [116] arXiv:2405.10626 [pdf, ps, html, other]
-
Title: Dynamic data sampler for cross-language transfer learning in large language modelsComments: Accepted by ICASSP 2024Subjects: Computation and Language (cs.CL)
Large Language Models (LLMs) have gained significant attention in the field of natural language processing (NLP) due to their wide range of applications. However, training LLMs for languages other than English poses significant challenges, due to the difficulty in acquiring large-scale corpus and the requisite computing resources. In this paper, we propose ChatFlow, a cross-language transfer-based LLM, to address these challenges and train large Chinese language models in a cost-effective manner. We employ a mix of Chinese, English, and parallel corpus to continuously train the LLaMA2 model, aiming to align cross-language representations and facilitate the knowledge transfer specifically to the Chinese language model. In addition, we use a dynamic data sampler to progressively transition the model from unsupervised pre-training to supervised fine-tuning. Experimental results demonstrate that our approach accelerates model convergence and achieves superior performance. We evaluate ChatFlow on popular Chinese and English benchmarks, the results indicate that it outperforms other Chinese models post-trained on LLaMA-2-7B.
- [117] arXiv:2405.10629 [pdf, ps, html, other]
-
Title: DeepPavlov at SemEval-2024 Task 8: Leveraging Transfer Learning for Detecting Boundaries of Machine-Generated TextsComments: New best score from the leaderboard, to appear in SemEval-2024 Workshop proceedingsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The Multigenerator, Multidomain, and Multilingual Black-Box Machine-Generated Text Detection shared task in the SemEval-2024 competition aims to tackle the problem of misusing collaborative human-AI writing. Although there are a lot of existing detectors of AI content, they are often designed to give a binary answer and thus may not be suitable for more nuanced problem of finding the boundaries between human-written and machine-generated texts, while hybrid human-AI writing becomes more and more popular. In this paper, we address the boundary detection problem. Particularly, we present a pipeline for augmenting data for supervised fine-tuning of DeBERTaV3. We receive new best MAE score, according to the leaderboard of the competition, with this pipeline.
- [118] arXiv:2405.10630 [pdf, ps, html, other]
-
Title: Medical Dialogue: A Survey of Categories, Methods, Evaluation and ChallengesXiaoming Shi, Zeming Liu, Li Du, Yuxuan Wang, Hongru Wang, Yuhang Guo, Tong Ruan, Jie Xu, Shaoting ZhangSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
This paper surveys and organizes research works on medical dialog systems, which is an important yet challenging task. Although these systems have been surveyed in the medical community from an application perspective, a systematic review from a rigorous technical perspective has to date remained noticeably absent. As a result, an overview of the categories, methods, and evaluation of medical dialogue systems remain limited and underspecified, hindering the further improvement of this area. To fill this gap, we investigate an initial pool of 325 papers from well-known computer science, and natural language processing conferences and journals, and make an overview. Recently, large language models have shown strong model capacity on downstream tasks, which also reshaped medical dialog systems' foundation. Despite the alluring practical application value, current medical dialogue systems still suffer from problems. To this end, this paper lists the grand challenges of medical dialog systems, especially of large language models.
- [119] arXiv:2405.10632 [pdf, ps, html, other]
-
Title: Beyond static AI evaluations: advancing human interaction evaluations for LLM harms and risksComments: 15 pages, 1 figureSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
Model evaluations are central to understanding the safety, risks, and societal impacts of AI systems. While most real-world AI applications involve human-AI interaction, most current evaluations (e.g., common benchmarks) of AI models do not. Instead, they incorporate human factors in limited ways, assessing the safety of models in isolation, thereby falling short of capturing the complexity of human-model interactions. In this paper, we discuss and operationalize a definition of an emerging category of evaluations -- "human interaction evaluations" (HIEs) -- which focus on the assessment of human-model interactions or the process and the outcomes of humans using models. First, we argue that HIEs can be used to increase the validity of safety evaluations, assess direct human impact and interaction-specific harms, and guide future assessments of models' societal impact. Second, we propose a safety-focused HIE design framework -- containing a human-LLM interaction taxonomy -- with three stages: (1) identifying the risk or harm area, (2) characterizing the use context, and (3) choosing the evaluation parameters. Third, we apply our framework to two potential evaluations for overreliance and persuasion risks. Finally, we conclude with tangible recommendations for addressing concerns over costs, replicability, and unrepresentativeness of HIEs.
- [120] arXiv:2405.10633 [pdf, ps, html, other]
-
Title: Harnessing Collective Structure Knowledge in Data Augmentation for Graph Neural NetworksSubjects: Machine Learning (cs.LG)
Graph neural networks (GNNs) have achieved state-of-the-art performance in graph representation learning. Message passing neural networks, which learn representations through recursively aggregating information from each node and its neighbors, are among the most commonly-used GNNs. However, a wealth of structural information of individual nodes and full graphs is often ignored in such process, which restricts the expressive power of GNNs. Various graph data augmentation methods that enable the message passing with richer structure knowledge have been introduced as one main way to tackle this issue, but they are often focused on individual structure features and difficult to scale up with more structure features. In this work we propose a novel approach, namely collective structure knowledge-augmented graph neural network (CoS-GNN), in which a new message passing method is introduced to allow GNNs to harness a diverse set of node- and graph-level structure features, together with original node features/attributes, in augmented graphs. In doing so, our approach largely improves the structural knowledge modeling of GNNs in both node and graph levels, resulting in substantially improved graph representations. This is justified by extensive empirical results where CoS-GNN outperforms state-of-the-art models in various graph-level learning tasks, including graph classification, anomaly detection, and out-of-distribution generalization.
- [121] arXiv:2405.10635 [pdf, ps, html, other]
-
Title: Implementation of OpenAPI Wireshark Dissectors to Validate SBI Messages of 5G Core NetworksComments: Author's version of a paper accepted for publication in Proceedings of the 28th ITG-Symposium Mobile Communication - Technologies and ApplicationsSubjects: Networking and Internet Architecture (cs.NI)
This paper introduces a novel Wireshark dissector designed to facilitate the analysis of Service-Based Interface (SBI) communication in 5G Core Networks. Our approach involves parsing the OpenAPI schemes provided by the 5G specification to automatically generate the dissector code. Our tool enables the validation of 5G Core Network traces to ensure compliance with the specifications. Through testing against three open-source 5G Core Network projects, we identified several issues where messages deviate from specification standards, highlighting the significance of our implementation in ensuring protocol conformity and network reliability.
- [122] arXiv:2405.10637 [pdf, ps, html, other]
-
Title: Layer-Condensed KV Cache for Efficient Inference of Large Language ModelsComments: Accepted to ACL2024 main conferenceSubjects: Computation and Language (cs.CL)
Huge memory consumption has been a major bottleneck for deploying high-throughput large language models in real-world applications. In addition to the large number of parameters, the key-value (KV) cache for the attention mechanism in the transformer architecture consumes a significant amount of memory, especially when the number of layers is large for deep language models. In this paper, we propose a novel method that only computes and caches the KVs of a small number of layers, thus significantly saving memory consumption and improving inference throughput. Our experiments on large language models show that our method achieves up to 26$\times$ higher throughput than standard transformers and competitive performance in language modeling and downstream tasks. In addition, our method is orthogonal to existing transformer memory-saving techniques, so it is straightforward to integrate them with our model, achieving further improvement in inference efficiency. Our code is available at this https URL.
- [123] arXiv:2405.10640 [pdf, ps, html, other]
-
Title: COMET: NFT Price Prediction with Wallet ProfilingComments: Accepted by KDD 2024 (ADS Track)Subjects: Social and Information Networks (cs.SI)
As the non-fungible token (NFT) market flourishes, price prediction emerges as a pivotal direction for investors gaining valuable insight to maximize returns. However, existing works suffer from a lack of practical definitions and standardized evaluations, limiting their practical application. Moreover, the influence of users' multi-behaviour transactions that are publicly accessible on NFT price is still not explored and exhibits challenges. In this paper, we address these gaps by presenting a practical and hierarchical problem definition. This approach unifies both collection-level and token-level task and evaluation methods, which cater to varied practical requirements of investors. To further understand the impact of user behaviours on the variation of NFT price, we propose a general wallet profiling framework and develop a COmmunity enhanced Multi-bEhavior Transaction graph model, named COMET. COMET profiles wallets with a comprehensive view and considers the impact of diverse relations and interactions within the NFT ecosystem on NFT price variations, thereby improving prediction performance. Extensive experiments conducted in our deployed system demonstrate the superiority of COMET, underscoring its potential in the insight toolkit for NFT investors.
- [124] arXiv:2405.10642 [pdf, ps, html, other]
-
Title: Hi-GMAE: Hierarchical Graph Masked AutoencodersComments: 10 pages, 6 figures, 3 tablesSubjects: Machine Learning (cs.LG)
Graph Masked Autoencoders (GMAEs) have emerged as a notable self-supervised learning approach for graph-structured data. Existing GMAE models primarily focus on reconstructing node-level information, categorizing them as single-scale GMAEs. This methodology, while effective in certain contexts, tends to overlook the complex hierarchical structures inherent in many real-world graphs. For instance, molecular graphs exhibit a clear hierarchical organization in the form of the atoms-functional groups-molecules structure. Hence, the inability of single-scale GMAE models to incorporate these hierarchical relationships often leads to their inadequate capture of crucial high-level graph information, resulting in a noticeable decline in performance. To address this limitation, we propose Hierarchical Graph Masked AutoEncoders (Hi-GMAE), a novel multi-scale GMAE framework designed to handle the hierarchical structures within graphs. First, Hi-GMAE constructs a multi-scale graph hierarchy through graph pooling, enabling the exploration of graph structures across different granularity levels. To ensure masking uniformity of subgraphs across these scales, we propose a novel coarse-to-fine strategy that initiates masking at the coarsest scale and progressively back-projects the mask to the finer scales. Furthermore, we integrate a gradual recovery strategy with the masking process to mitigate the learning challenges posed by completely masked subgraphs. Diverging from the standard graph neural network (GNN) used in GMAE models, Hi-GMAE modifies its encoder and decoder into hierarchical structures. This entails using GNN at the finer scales for detailed local graph analysis and employing a graph transformer at coarser scales to capture global information. Our experiments on 15 graph datasets consistently demonstrate that Hi-GMAE outperforms 17 state-of-the-art self-supervised competitors.
- [125] arXiv:2405.10645 [pdf, ps, other]
-
Title: ChatGPT in Classrooms: Transforming Challenges into Opportunities in EducationComments: 5 pages, 1 figure, 1 tableSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI); Human-Computer Interaction (cs.HC)
In the era of exponential technology growth, one unexpected guest has claimed a seat in classrooms worldwide, Artificial Intelligence. Generative AI, such as ChatGPT, promises a revolution in education, yet it arrives with a double-edged sword. Its potential for personalized learning is offset by issues of cheating, inaccuracies, and educators struggling to incorporate it effectively into their lesson design. We are standing on the brink of this educational frontier, and it is clear that we need to navigate this terrain with a lot of care. This is a major challenge that could undermine the integrity and value of our educational process. So, how can we turn these challenges into opportunities? When used inappropriately, AI tools can become the perfect tool for the cut copy paste mentality, and quickly begin to corrode critical thinking, creativity, and deep understanding, the most important skills in our rapidly changing world. Teachers feel that they are not equipped to leverage this technology, widening the digital divide among educators and institutions. Addressing these concerns calls for an in depth research approach. We will employ empirical research, drawing on the Technology Acceptance Model, to assess the attitudes toward generative AI among educators and students. Understanding their perceptions, usage patterns, and hurdles is the first crucial step in creating an effective solution. The present study will be used as a process manual for future researchers to apply, running their own data, based on the steps explained here
- [126] arXiv:2405.10647 [pdf, ps, html, other]
-
Title: Cyclical Weight Consolidation: Towards Solving Catastrophic Forgetting in Serial Federated LearningComments: 12 pages, 8 figuresSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Federated Learning (FL) has gained attention for addressing data scarcity and privacy concerns. While parallel FL algorithms like FedAvg exhibit remarkable performance, they face challenges in scenarios with diverse network speeds and concerns about centralized control, especially in multi-institutional collaborations like the medical domain. Serial FL presents an alternative solution, circumventing these challenges by transferring model updates serially between devices in a cyclical manner. Nevertheless, it is deemed inferior to parallel FL in that (1) its performance shows undesirable fluctuations, and (2) it converges to a lower plateau, particularly when dealing with non-IID data. The observed phenomenon is attributed to catastrophic forgetting due to knowledge loss from previous sites. In this paper, to overcome fluctuation and low efficiency in the iterative learning and forgetting process, we introduce cyclical weight consolidation (CWC), a straightforward yet potent approach specifically tailored for serial FL. CWC employs a consolidation matrix to regulate local optimization. This matrix tracks the significance of each parameter on the overall federation throughout the entire training trajectory, preventing abrupt changes in significant weights. During revisitation, to maintain adaptability, old memory undergoes decay to incorporate new information. Our comprehensive evaluations demonstrate that in various non-IID settings, CWC mitigates the fluctuation behavior of the original serial FL approach and enhances the converged performance consistently and significantly. The improved performance is either comparable to or better than the parallel vanilla.
- [127] arXiv:2405.10648 [pdf, ps, html, other]
-
Title: Optimal Service Placement, Request Routing and CPU Sizing in Cooperative Mobile Edge Computing Networks for Delay-Sensitive ApplicationsSubjects: Networking and Internet Architecture (cs.NI); Information Theory (cs.IT)
We study joint optimization of service placement, request routing, and CPU sizing in a cooperative MEC system. The problem is considered from the perspective of the service provider (SP), which delivers heterogeneous MEC-enabled delay-sensitive services, and needs to pay for the used resources to the mobile network operators and the cloud provider, while earning revenue from the served requests. We formulate the problem of maximizing the SP's total profit subject to the computation, storage, and communication constraints of each edge node and end-to-end delay requirements of the services as a mixed-integer non-convex optimization problem, and prove it to be NP-hard.
To tackle the challenges in solving the problem, we first introduce a design trade-off parameter for different delay requirements of each service, which maintains flexibility in prioritizing them, and transform the original optimization problem by the new delay constraints. Then, by exploiting a hidden convexity, we reformulate the delay constraints into an equivalent form. Next, to handle the challenge of the complicating (integer) variables, using primal decomposition, we decompose the problem into an equivalent form of master and inner sub-problems over the mixed and real variables, respectively. We then employ a cutting-plane approach for building up adequate representations of the extremal value of the inner problem as a function of the complicating variables and the set of values of the complicating variables for which the inner problem is feasible. Finally, we propose a solution strategy based on generalized Benders decomposition and prove its convergence to the optimal solution within a limited number of iterations. Extensive simulation results demonstrate that the proposed scheme significantly outperforms the existing mechanisms in terms of the SP's profit, cache hit ratio, running time, and end-to-end delay. - [128] arXiv:2405.10650 [pdf, ps, html, other]
-
Title: SPOR: A Comprehensive and Practical Evaluation Method for Compositional Generalization in Data-to-Text GenerationSubjects: Computation and Language (cs.CL)
Compositional generalization is an important ability of language models and has many different manifestations. For data-to-text generation, previous research on this ability is limited to a single manifestation called Systematicity and lacks consideration of large language models (LLMs), which cannot fully cover practical application scenarios. In this work, we propose SPOR, a comprehensive and practical evaluation method for compositional generalization in data-to-text generation. SPOR includes four aspects of manifestations (Systematicity, Productivity, Order invariance, and Rule learnability) and allows high-quality evaluation without additional manual annotations based on existing datasets. We demonstrate SPOR on two different datasets and evaluate some existing language models including LLMs. We find that the models are deficient in various aspects of the evaluation and need further improvement. Our work shows the necessity for comprehensive research on different manifestations of compositional generalization in data-to-text generation and provides a framework for evaluation.
- [129] arXiv:2405.10658 [pdf, ps, html, other]
-
Title: Cost-Effective Fault Tolerance for CNNs Using Parameter Vulnerability Based Hardening and PruningComments: 7 pages, 7 figures, 2 tables, 32 references, the paper is accepted at IOLTS 2024Subjects: Machine Learning (cs.LG)
Convolutional Neural Networks (CNNs) have become integral in safety-critical applications, thus raising concerns about their fault tolerance. Conventional hardware-dependent fault tolerance methods, such as Triple Modular Redundancy (TMR), are computationally expensive, imposing a remarkable overhead on CNNs. Whereas fault tolerance techniques can be applied either at the hardware level or at the model levels, the latter provides more flexibility without sacrificing generality. This paper introduces a model-level hardening approach for CNNs by integrating error correction directly into the neural networks. The approach is hardware-agnostic and does not require any changes to the underlying accelerator device. Analyzing the vulnerability of parameters enables the duplication of selective filters/neurons so that their output channels are effectively corrected with an efficient and robust correction layer. The proposed method demonstrates fault resilience nearly equivalent to TMR-based correction but with significantly reduced overhead. Nevertheless, there exists an inherent overhead to the baseline CNNs. To tackle this issue, a cost-effective parameter vulnerability based pruning technique is proposed that outperforms the conventional pruning method, yielding smaller networks with a negligible accuracy loss. Remarkably, the hardened pruned CNNs perform up to 24\% faster than the hardened un-pruned ones.
- [130] arXiv:2405.10659 [pdf, ps, html, other]
-
Title: Realistic Evaluation of Toxicity in Large Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large language models (LLMs) have become integral to our professional workflows and daily lives. Nevertheless, these machine companions of ours have a critical flaw: the huge amount of data which endows them with vast and diverse knowledge, also exposes them to the inevitable toxicity and bias. While most LLMs incorporate defense mechanisms to prevent the generation of harmful content, these safeguards can be easily bypassed with minimal prompt engineering. In this paper, we introduce the new Thoroughly Engineered Toxicity (TET) dataset, comprising manually crafted prompts designed to nullify the protective layers of such models. Through extensive evaluations, we demonstrate the pivotal role of TET in providing a rigorous benchmark for evaluation of toxicity awareness in several popular LLMs: it highlights the toxicity in the LLMs that might remain hidden when using normal prompts, thus revealing subtler issues in their behavior.
- [131] arXiv:2405.10661 [pdf, ps, html, other]
-
Title: Verification Algorithms for Automated Separation Logic VerifiersSubjects: Programming Languages (cs.PL)
Most automated program verifiers for separation logic use either symbolic execution or verification condition generation to extract proof obligations, which are then handed over to an SMT solver. Existing verification algorithms are designed to be sound, but differ in performance and completeness. These characteristics may also depend on the programs and properties to be verified. Consequently, developers and users of program verifiers have to select a verification algorithm carefully for their application domain. Taking an informed decision requires a systematic comparison of the performance and completeness characteristics of the verification algorithms used by modern separation logic verifiers, but such a comparison does not exist.
This paper describes five verification algorithms for separation logic, three that are used in existing tools and two novel algorithms that combine characteristics of existing symbolic execution and verification condition generation algorithms. A detailed evaluation of implementations of these five algorithms in the Viper infrastructure assesses their performance and completeness for different classes of input programs. Based on the experimental results, we identify candidate portfolios of algorithms that maximize completeness and performance. - [132] arXiv:2405.10672 [pdf, ps, other]
-
Title: Pragmatic Communication for Remote Control of Finite-State Markov ProcessesPietro Talli, Edoardo David Santi, Federico Chiariotti, Touraj Soleymani, Federico Mason, Andrea Zanella, Deniz GündüzComments: Submitted for publication in the IEEE Journal on Selected Areas in CommunicationsSubjects: Multiagent Systems (cs.MA); Networking and Internet Architecture (cs.NI)
Pragmatic or goal-oriented communication can optimize communication decisions beyond the reliable transmission of data, instead aiming at directly affecting application performance with the minimum channel utilization. In this paper, we develop a general theoretical framework for the remote control of finite-state Markov processes, using pragmatic communication over a costly zero-delay communication channel. To that end, we model a cyber-physical system composed of an encoder, which observes and transmits the states of a process in real-time, and a decoder, which receives that information and controls the behavior of the process. The encoder and the decoder should cooperatively optimize the trade-off between the control performance (i.e., reward) and the communication cost (i.e., channel use). This scenario underscores a pragmatic (i.e., goal-oriented) communication problem, where the purpose is to convey only the data that is most valuable for the underlying task, taking into account the state of the decoder (hence, the pragmatic aspect). We investigate two different decision-making architectures: in pull-based remote control, the decoder is the only decision-maker, while in push-based remote control, the encoder and the decoder constitute two independent decision-makers, leading to a multi-agent scenario. We propose three algorithms to optimize our system (i.e., design the encoder and the decoder policies), discuss the optimality guarantees ofs the algorithms, and shed light on their computational complexity and fundamental limits.
- [133] arXiv:2405.10674 [pdf, ps, html, other]
-
Title: From Sora What We Can See: A Survey of Text-to-Video GenerationRui Sun, Yumin Zhang, Tejal Shah, Jiahao Sun, Shuoying Zhang, Wenqi Li, Haoran Duan, Bo Wei, Rajiv RanjanComments: A comprehensive list of text-to-video generation studies in this survey is available at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
With impressive achievements made, artificial intelligence is on the path forward to artificial general intelligence. Sora, developed by OpenAI, which is capable of minute-level world-simulative abilities can be considered as a milestone on this developmental path. However, despite its notable successes, Sora still encounters various obstacles that need to be resolved. In this survey, we embark from the perspective of disassembling Sora in text-to-video generation, and conducting a comprehensive review of literature, trying to answer the question, \textit{From Sora What We Can See}. Specifically, after basic preliminaries regarding the general algorithms are introduced, the literature is categorized from three mutually perpendicular dimensions: evolutionary generators, excellent pursuit, and realistic panorama. Subsequently, the widely used datasets and metrics are organized in detail. Last but more importantly, we identify several challenges and open problems in this domain and propose potential future directions for research and development.
- [134] arXiv:2405.10678 [pdf, ps, other]
-
Title: IT Strategic alignment in the decentralized finance (DeFi): CBDC and digital currenciesComments: Keywords: IT Strategic alignment, Decentralized Finance (DeFi), Cryptocurrency, Digital EconomySubjects: Computers and Society (cs.CY); Cryptography and Security (cs.CR)
Cryptocurrency can be understood as a digital asset transacted among participants in the crypto economy. Every cryptocurrency must have an associated Blockchain. Blockchain is a Distributed Ledger Technology (DLT) which supports cryptocurrencies, this may be considered as the most promising disruptive technology in the industry 4.0 context. Decentralized finance (DeFi) is a Blockchain-based financial infrastructure, the term generally refers to an open, permissionless, and highly interoperable protocol stack built on public smart contract platforms, such as the Ethereum Blockchain. It replicates existing financial services in a more open and transparent way. DeFi does not rely on intermediaries and centralized institutions. Instead, it is based on open protocols and decentralized applications (Dapps). Considering that there are many digital coins, stablecoins and central bank digital currencies (CBDCs), these currencies should interact among each other sometime. For this interaction the Information Technology elements play an important whole as enablers and IT strategic alignment. This paper considers the strategic alignment model proposed by Henderson and Venkatraman (1993) and Luftman (1996). This paper seeks to answer two main questions 1) What are the common IT elements in the DeFi? And 2) How the elements connect to the IT strategic alignment in DeFi? Through a Systematic Literature Review (SLR). Results point out that there are many IT elements already mentioned by literature, however there is a lack in the literature about the connection between IT elements and IT strategic alignment in a Decentralized Finance (DeFi) architectural network. After final considerations, limitations and future research agenda are presented. Keywords: IT Strategic alignment, Decentralized Finance (DeFi), Cryptocurrency, Digital Economy.
- [135] arXiv:2405.10679 [pdf, ps, other]
-
Title: Off-the-Shelf Neural Network Architectures for Forex Time Series Prediction come at a CostSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computational Engineering, Finance, and Science (cs.CE); Mathematical Finance (q-fin.MF)
Our study focuses on comparing the performance and resource requirements between different Long Short-Term Memory (LSTM) neural network architectures and an ANN specialized architecture for forex market prediction. We analyze the execution time of the models as well as the resources consumed, such as memory and computational power. Our aim is to demonstrate that the specialized architecture not only achieves better results in forex market prediction but also executes using fewer resources and in a shorter time frame compared to LSTM architectures. This comparative analysis will provide significant insights into the suitability of these two types of architectures for time series prediction in the forex market environment.
- [136] arXiv:2405.10681 [pdf, ps, html, other]
-
Title: Know in AdVance: Linear-Complexity Forecasting of Ad Campaign Performance with Evolving User InterestXiaoYu Wang, YongHui Guo, Hui Sheng, Peili Lv, Chi Zhou, Wei Huang, ShiQin Ta, Dongbo Huang, XiuJin Yang, Lan Xu, Hao Zhou, Yusheng JiComments: 12 pages, 4 figures, accepted at ACM SIGKDD 2024Subjects: Information Retrieval (cs.IR)
Real-time Bidding (RTB) advertisers wish to \textit{know in advance} the expected cost and yield of ad campaigns to avoid trial-and-error expenses. However, Campaign Performance Forecasting (CPF), a sequence modeling task involving tens of thousands of ad auctions, poses challenges of evolving user interest, auction representation, and long context, making coarse-grained and static-modeling methods sub-optimal. We propose \textit{AdVance}, a time-aware framework that integrates local auction-level and global campaign-level modeling. User preference and fatigue are disentangled using a time-positioned sequence of clicked items and a concise vector of all displayed items. Cross-attention, conditioned on the fatigue vector, captures the dynamics of user interest toward each candidate ad. Bidders compete with each other, presenting a complete graph similar to the self-attention mechanism. Hence, we employ a Transformer Encoder to compress each auction into embedding by solving auxiliary tasks. These sequential embeddings are then summarized by a conditional state space model (SSM) to comprehend long-range dependencies while maintaining global linear complexity. Considering the irregular time intervals between auctions, we make SSM's parameters dependent on the current auction embedding and the time interval. We further condition SSM's global predictions on the accumulation of local results. Extensive evaluations and ablation studies demonstrate its superiority over state-of-the-art methods. AdVance has been deployed on the Tencent Advertising platform, and A/B tests show a remarkable 4.5\% uplift in Average Revenue per User (ARPU).
- [137] arXiv:2405.10689 [pdf, ps, other]
-
Title: Revolutionizing Process Mining: A Novel Architecture for ChatGPT Integration and Enhanced User Experience through Optimized Prompt EngineeringSubjects: Computation and Language (cs.CL)
In the rapidly evolving field of business process management, there is a growing need for analytical tools that can transform complex data into actionable insights. This research introduces a novel approach by integrating Large Language Models (LLMs), such as ChatGPT, into process mining tools, making process analytics more accessible to a wider audience. The study aims to investigate how ChatGPT enhances analytical capabilities, improves user experience, increases accessibility, and optimizes the architectural frameworks of process mining tools. The key innovation of this research lies in developing a tailored prompt engineering strategy for each process mining submodule, ensuring that the AI-generated outputs are accurate and relevant to the context. The integration architecture follows an Extract, Transform, Load (ETL) process, which includes various process mining engine modules and utilizes zero-shot and optimized prompt engineering techniques. ChatGPT is connected via APIs and receives structured outputs from the process mining modules, enabling conversational interactions. To validate the effectiveness of this approach, the researchers used data from 17 companies that employ BehfaLab's Process Mining Tool. The results showed significant improvements in user experience, with an expert panel rating 72% of the results as "Good". This research contributes to the advancement of business process analysis methodologies by combining process mining with artificial intelligence. Future research directions include further optimization of prompt engineering, exploration of integration with other AI technologies, and assessment of scalability across various business environments. This study paves the way for continuous innovation at the intersection of process mining and artificial intelligence, promising to revolutionize the way businesses analyze and optimize their processes.
- [138] arXiv:2405.10690 [pdf, ps, html, other]
-
Title: CoLeaF: A Contrastive-Collaborative Learning Framework for Weakly Supervised Audio-Visual Video ParsingSubjects: Computer Vision and Pattern Recognition (cs.CV)
Weakly supervised audio-visual video parsing (AVVP) methods aim to detect audible-only, visible-only, and audible-visible events using only video-level labels. Existing approaches tackle this by leveraging unimodal and cross-modal contexts. However, we argue that while cross-modal learning is beneficial for detecting audible-visible events, in the weakly supervised scenario, it negatively impacts unaligned audible or visible events by introducing irrelevant modality information. In this paper, we propose CoLeaF, a novel learning framework that optimizes the integration of cross-modal context in the embedding space such that the network explicitly learns to combine cross-modal information for audible-visible events while filtering them out for unaligned events. Additionally, as videos often involve complex class relationships, modelling them improves performance. However, this introduces extra computational costs into the network. Our framework is designed to leverage cross-class relationships during training without incurring additional computations at inference. Furthermore, we propose new metrics to better evaluate a method's capabilities in performing AVVP. Our extensive experiments demonstrate that CoLeaF significantly improves the state-of-the-art results by an average of 1.9% and 2.4% F-score on the LLP and UnAV-100 datasets, respectively.
- [139] arXiv:2405.10695 [pdf, ps, other]
-
Title: On the Design of Super ConstellationsThrassos K. Oikonomou, Dimitrios Tyrovolas, Sotiris A. Tegos, Panagiotis D. Diamantoulakis, Panagiotis Sarigiannidis, George K. KaragiannidisSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
In the evolving landscape of sixth-generation (6G) wireless networks, which demand ultra high data rates, this study introduces the concept of super constellation communications. Also, we present super amplitude phase shift keying (SAPSK), an innovative modulation technique designed to achieve these ultra high data rate demands. SAPSK is complemented by the generalized polar distance detector (GPD-D), which approximates the optimal maximum likelihood detector in channels with Gaussian phase noise (GPN). By leveraging the decision regions formulated by GPD-D, a tight closed-form approximation for the symbol error probability (SEP) of SAPSK constellations is derived, while a detection algorithm with O(1) time complexity is developed to ensure fast and efficient SAPSK symbol detection. Finally, the theoretical performance of SAPSK and the efficiency of the proposed O(1) algorithm are validated by numerical simulations, highlighting both its superiority in terms of SEP compared to various constellations and its practical advantages in terms of fast and accurate symbol detection.
- [140] arXiv:2405.10696 [pdf, ps, html, other]
-
Title: Autonomous AI-enabled Industrial Sorting Pipeline for Advanced Textile RecyclingYannis Spyridis, Vasileios Argyriou, Antonios Sarigiannidis, Panagiotis Radoglou, Panagiotis SarigiannidisSubjects: Computer Vision and Pattern Recognition (cs.CV)
The escalating volumes of textile waste globally necessitate innovative waste management solutions to mitigate the environmental impact and promote sustainability in the fashion industry. This paper addresses the inefficiencies of traditional textile sorting methods by introducing an autonomous textile analysis pipeline. Utilising robotics, spectral imaging, and AI-driven classification, our system enhances the accuracy, efficiency, and scalability of textile sorting processes, contributing to a more sustainable and circular approach to waste management. The integration of a Digital Twin system further allows critical evaluation of technical and economic feasibility, providing valuable insights into the sorting system's accuracy and reliability. The proposed framework, inspired by Industry 4.0 principles, comprises five interconnected layers facilitating seamless data exchange and coordination within the system. Preliminary results highlight the potential of our holistic approach to mitigate environmental impact and foster a positive shift towards recycling in the textile industry.
- [141] arXiv:2405.10700 [pdf, ps, html, other]
-
Title: SynDy: Synthetic Dynamic Dataset Generation Framework for Misinformation TasksJournal-ref: Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR '24), July 14--18, 2024, Washington, DC, USASubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computers and Society (cs.CY)
Diaspora communities are disproportionately impacted by off-the-radar misinformation and often neglected by mainstream fact-checking efforts, creating a critical need to scale-up efforts of nascent fact-checking initiatives. In this paper we present SynDy, a framework for Synthetic Dynamic Dataset Generation to leverage the capabilities of the largest frontier Large Language Models (LLMs) to train local, specialized language models. To the best of our knowledge, SynDy is the first paper utilizing LLMs to create fine-grained synthetic labels for tasks of direct relevance to misinformation mitigation, namely Claim Matching, Topical Clustering, and Claim Relationship Classification. SynDy utilizes LLMs and social media queries to automatically generate distantly-supervised, topically-focused datasets with synthetic labels on these three tasks, providing essential tools to scale up human-led fact-checking at a fraction of the cost of human-annotated data. Training on SynDy's generated labels shows improvement over a standard baseline and is not significantly worse compared to training on human labels (which may be infeasible to acquire). SynDy is being integrated into Meedan's chatbot tiplines that are used by over 50 organizations, serve over 230K users annually, and automatically distribute human-written fact-checks via messaging apps such as WhatsApp. SynDy will also be integrated into our deployed Co-Insights toolkit, enabling low-resource organizations to launch tiplines for their communities. Finally, we envision SynDy enabling additional fact-checking tools such as matching new misinformation claims to high-quality explainers on common misinformation topics.
- [142] arXiv:2405.10702 [pdf, ps, html, other]
-
Title: Empowering Prior to Court Legal Analysis: A Transparent and Accessible Dataset for Defensive Statement Classification and InterpretationSubjects: Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
The classification of statements provided by individuals during police interviews is a complex and significant task within the domain of natural language processing (NLP) and legal informatics. The lack of extensive domain-specific datasets raises challenges to the advancement of NLP methods in the field. This paper aims to address some of the present challenges by introducing a novel dataset tailored for classification of statements made during police interviews, prior to court proceedings. Utilising the curated dataset for training and evaluation, we introduce a fine-tuned DistilBERT model that achieves state-of-the-art performance in distinguishing truthful from deceptive statements. To enhance interpretability, we employ explainable artificial intelligence (XAI) methods to offer explainability through saliency maps, that interpret the model's decision-making process. Lastly, we present an XAI interface that empowers both legal professionals and non-specialists to interact with and benefit from our system. Our model achieves an accuracy of 86%, and is shown to outperform a custom transformer architecture in a comparative study. This holistic approach advances the accessibility, transparency, and effectiveness of statement analysis, with promising implications for both legal practice and research.
- [143] arXiv:2405.10703 [pdf, ps, html, other]
-
Title: Safe Control using Occupancy Grid Map-based Control Barrier Function (OGM-CBF)Subjects: Robotics (cs.RO)
Safe navigation in unknown environments stands as a significant challenge in the field of robotics. Control Barrier Function (CBF) is a strong mathematical tool to guarantee safety requirements. However, a common assumption in many works is that the CBF is already known and obstacles have predefined shapes. In this letter, we present a novel method called Occupancy Grid Map-based Control Barrier Function (OGM-CBF), which defines Control Barrier Function based on Occupancy Grid Maps. This enables generalization to unknown environments while generating online local or global maps of the environment using onboard perception sensors such as LiDAR or camera. With this method, the system guarantees safety via a single, continuously differentiable CBF per time step, which can be represented as one constraint in the CBF-QP optimization formulation while having an arbitrary number of obstacles with unknown shapes in the environment. This enables practical real-time implementation of CBF in both unknown and known environments. The efficacy of OGM-CBF is demonstrated in the safe control of an autonomous car in the CARLA simulator and a real-world industrial mobile robot.
- [144] arXiv:2405.10706 [pdf, ps, html, other]
-
Title: Challenging the Human-in-the-loop in Algorithmic Decision-makingSubjects: Machine Learning (cs.LG)
We discuss the role of humans in algorithmic decision-making (ADM) for socially relevant problems from a technical and philosophical perspective. In particular, we illustrate tensions arising from diverse expectations, values, and constraints by and on the humans involved. To this end, we assume that a strategic decision-maker (SDM) introduces ADM to optimize strategic and societal goals while the algorithms' recommended actions are overseen by a practical decision-maker (PDM) - a specific human-in-the-loop - who makes the final decisions. While the PDM is typically assumed to be a corrective, it can counteract the realization of the SDM's desired goals and societal values not least because of a misalignment of these values and unmet information needs of the PDM. This has significant implications for the distribution of power between the stakeholders in ADM, their constraints, and information needs. In particular, we emphasize the overseeing PDM's role as a potential political and ethical decision maker, who acts expected to balance strategic, value-driven objectives and on-the-ground individual decisions and constraints. We demonstrate empirically, on a machine learning benchmark dataset, the significant impact an overseeing PDM's decisions can have even if the PDM is constrained to performing only a limited amount of actions differing from the algorithms' recommendations. To ensure that the SDM's intended values are realized, the PDM needs to be provided with appropriate information conveyed through tailored explanations and its role must be characterized clearly. Our findings emphasize the need for an in-depth discussion of the role and power of the PDM and challenge the often-taken view that just including a human-in-the-loop in ADM ensures the 'correct' and 'ethical' functioning of the system.
- [145] arXiv:2405.10707 [pdf, ps, html, other]
-
Title: HARIS: Human-Like Attention for Reference Image SegmentationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Referring image segmentation (RIS) aims to locate the particular region corresponding to the language expression. Existing methods incorporate features from different modalities in a \emph{bottom-up} manner. This design may get some unnecessary image-text pairs, which leads to an inaccurate segmentation mask. In this paper, we propose a referring image segmentation method called HARIS, which introduces the Human-Like Attention mechanism and uses the parameter-efficient fine-tuning (PEFT) framework. To be specific, the Human-Like Attention gets a \emph{feedback} signal from multi-modal features, which makes the network center on the specific objects and discard the irrelevant image-text pairs. Besides, we introduce the PEFT framework to preserve the zero-shot ability of pre-trained encoders. Extensive experiments on three widely used RIS benchmarks and the PhraseCut dataset demonstrate that our method achieves state-of-the-art performance and great zero-shot ability.
- [146] arXiv:2405.10708 [pdf, ps, html, other]
-
Title: Numerical Recovery of the Diffusion Coefficient in Diffusion Equations from Terminal MeasurementComments: 22 pages, 2 figuresSubjects: Numerical Analysis (math.NA)
In this work, we investigate a numerical procedure for recovering a space-dependent diffusion coefficient in a (sub)diffusion model from the given terminal data, and provide a rigorous numerical analysis of the procedure. By exploiting decay behavior of the observation in time, we establish a novel H{ö}lder type stability estimate for a large terminal time $T$. This is achieved by novel decay estimates of the (fractional) time derivative of the solution. To numerically recover the diffusion coefficient, we employ the standard output least-squares formulation with an $H^1(\Omega)$-seminorm penalty, and discretize the regularized problem by the Galerkin finite element method with continuous piecewise linear finite elements in space and backward Euler convolution quadrature in time. Further, we provide an error analysis of discrete approximations, and prove a convergence rate that matches the stability estimate. The derived $L^2(\Omega)$ error bound depends explicitly on the noise level, regularization parameter and discretization parameter(s), which gives a useful guideline of the \textsl{a priori} choice of discretization parameters with respect to the noise level in practical implementation. The error analysis is achieved using the conditional stability argument and discrete maximum-norm resolvent estimates. Several numerical experiments are also given to illustrate and complement the theoretical analysis.
- [147] arXiv:2405.10713 [pdf, ps, other]
-
Title: Development of Semantics-Based Distributed Middleware for Heterogeneous Data Integration and its Application for DroughtComments: 286 PagesSubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Drought is a complex environmental phenomenon that affects millions of people and communities all over the globe and is too elusive to be accurately predicted. This is mostly due to the scalability and variability of the web of environmental parameters that directly/indirectly causes the onset of different categories of drought. Since the dawn of man, efforts have been made to uniquely understand the natural indicators that provide signs of likely environmental events. These indicators/signs in the form of indigenous knowledge system have been used for generations. The intricate complexity of drought has, however, always been a major stumbling block for accurate drought prediction and forecasting systems. Recently, scientists in the field of agriculture and environmental monitoring have been discussing the integration of indigenous knowledge and scientific knowledge for a more accurate environmental forecasting system in order to incorporate diverse environmental information for a reliable drought forecast. Hence, in this research, the core objective is the development of a semantics-based data integration middleware that encompasses and integrates heterogeneous data models of local indigenous knowledge and sensor data towards an accurate drought forecasting system for the study areas. The local indigenous knowledge on drought gathered from the domain experts is transformed into rules to be used for performing deductive inference in conjunction with sensors data for determining the onset of drought through an automated inference generation module of the middleware. The semantic middleware incorporates, inter alia, a distributed architecture that consists of a streaming data processing engine based on Apache Kafka for real-time stream processing; a rule-based reasoning module; an ontology module for semantic representation of the knowledge bases.
- [148] arXiv:2405.10714 [pdf, ps, other]
-
Title: Persian Pronoun Resolution: Leveraging Neural Networks and Language ModelsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Coreference resolution, critical for identifying textual entities referencing the same entity, faces challenges in pronoun resolution, particularly identifying pronoun antecedents. Existing methods often treat pronoun resolution as a separate task from mention detection, potentially missing valuable information. This study proposes the first end-to-end neural network system for Persian pronoun resolution, leveraging pre-trained Transformer models like ParsBERT. Our system jointly optimizes both mention detection and antecedent linking, achieving a 3.37 F1 score improvement over the previous state-of-the-art system (which relied on rule-based and statistical methods) on the Mehr corpus. This significant improvement demonstrates the effectiveness of combining neural networks with linguistic models, potentially marking a significant advancement in Persian pronoun resolution and paving the way for further research in this under-explored area.
- [149] arXiv:2405.10718 [pdf, ps, html, other]
-
Title: SignLLM: Sign Languages Production Large Language ModelsComments: 33 pages, website at this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
In this paper, we introduce the first comprehensive multilingual sign language dataset named Prompt2Sign, which builds from public data including American Sign Language (ASL) and seven others. Our dataset transforms a vast array of videos into a streamlined, model-friendly format, optimized for training with translation models like seq2seq and text2text. Building on this new dataset, we propose SignLLM, the first multilingual Sign Language Production (SLP) model, which includes two novel multilingual SLP modes that allow for the generation of sign language gestures from input text or prompt. Both of the modes can use a new loss and a module based on reinforcement learning, which accelerates the training by enhancing the model's capability to autonomously sample high-quality data. We present benchmark results of SignLLM, which demonstrate that our model achieves state-of-the-art performance on SLP tasks across eight sign languages.
- [150] arXiv:2405.10722 [pdf, ps, html, other]
-
Title: About the Burton-Miller factor in the low frequency regionComments: 12 pages, 9 figuresSubjects: Numerical Analysis (math.NA)
The Burton-Miller method is a widely used approach in acoustics to enhance the stability of the boundary element method for exterior Helmholtz problems at so-called critical frequencies. This method depends on a coupling parameter $\eta$ and it can be shown that as long as $\eta$ has an imaginary part different from 0, the boundary integral formulation for the Helmholtz equation has a unique solution at all frequencies. A popular choice for this parameter is $\eta = \frac{\mathrm{i}}{k}$, where $k$ is the wavenumber. It can be shown that this choice is quasi optimal. However, especially in the low frequency region, where the critical frequencies are still sparsely distributed, different choices for this factor result in a smaller condition number and a smaller error of the solution. In this work, alternative choices for this factor are compared based on numerical experiments. Additionally, a way to enhance the Burton-Miller solution with $\eta = \frac{\mathrm{i}}{k}$ for a sound hard scatterer in the low frequency region by an additional step of a modified Richardson iteration is introduced.
- [151] arXiv:2405.10725 [pdf, ps, html, other]
-
Title: INDUS: Effective and Efficient Language Models for Scientific ApplicationsBishwaranjan Bhattacharjee, Aashka Trivedi, Masayasu Muraoka, Muthukumaran Ramasubramanian, Takuma Udagawa, Iksha Gurung, Rong Zhang, Bharath Dandala, Rahul Ramachandran, Manil Maskey, Kayleen Bugbee, Mike Little, Elizabeth Fancher, Lauren Sanders, Sylvain Costes, Sergi Blanco-Cuaresma, Kelly Lockhart, Thomas Allen, Felix Grazes, Megan Ansdel, Alberto Accomazzi, Yousef El-Kurdi, Davis Wertheimer, Birgit Pfitzmann, Cesar Berrospi Ramis, Michele Dolfi, Rafael Teixeira de Lima, Panos Vegenas, S. Karthik Mukkavilli, Peter Staar, Sanaz Vahidinia, Ryan McGranaghan, Armin Mehrabian, Tsendgar LeeSubjects: Computation and Language (cs.CL); Information Retrieval (cs.IR)
Large language models (LLMs) trained on general domain corpora showed remarkable results on natural language processing (NLP) tasks. However, previous research demonstrated LLMs trained using domain-focused corpora perform better on specialized tasks. Inspired by this pivotal insight, we developed INDUS, a comprehensive suite of LLMs tailored for the Earth science, biology, physics, heliophysics, planetary sciences and astrophysics domains and trained using curated scientific corpora drawn from diverse data sources. The suite of models include: (1) an encoder model trained using domain-specific vocabulary and corpora to address natural language understanding tasks, (2) a contrastive-learning-based general text embedding model trained using a diverse set of datasets drawn from multiple sources to address information retrieval tasks and (3) smaller versions of these models created using knowledge distillation techniques to address applications which have latency or resource constraints. We also created three new scientific benchmark datasets namely, CLIMATE-CHANGE-NER (entity-recognition), NASA-QA (extractive QA) and NASA-IR (IR) to accelerate research in these multi-disciplinary fields. Finally, we show that our models outperform both general-purpose encoders (RoBERTa) and existing domain-specific encoders (SciBERT) on these new tasks as well as existing benchmark tasks in the domains of interest.
- [152] arXiv:2405.10729 [pdf, ps, html, other]
-
Title: Contestable AI needs Computational ArgumentationFrancesco Leofante, Hamed Ayoobi, Adam Dejl, Gabriel Freedman, Deniz Gorur, Junqi Jiang, Guilherme Paulino-Passos, Antonio Rago, Anna Rapberger, Fabrizio Russo, Xiang Yin, Dekai Zhang, Francesca ToniSubjects: Artificial Intelligence (cs.AI)
AI has become pervasive in recent years, but state-of-the-art approaches predominantly neglect the need for AI systems to be contestable. Instead, contestability is advocated by AI guidelines (e.g. by the OECD) and regulation of automated decision-making (e.g. GDPR). In this position paper we explore how contestability can be achieved computationally in and for AI. We argue that contestable AI requires dynamic (human-machine and/or machine-machine) explainability and decision-making processes, whereby machines can (i) interact with humans and/or other machines to progressively explain their outputs and/or their reasoning as well as assess grounds for contestation provided by these humans and/or other machines, and (ii) revise their decision-making processes to redress any issues successfully raised during contestation. Given that much of the current AI landscape is tailored to static AIs, the need to accommodate contestability will require a radical rethinking, that, we argue, computational argumentation is ideally suited to support.
- [153] arXiv:2405.10736 [pdf, ps, html, other]
-
Title: StackOverflowVQA: Stack Overflow Visual Question Answering DatasetSubjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, people have increasingly used AI to help them with their problems by asking questions on different topics. One of these topics can be software-related and programming questions. In this work, we focus on the questions which need the understanding of images in addition to the question itself. We introduce the StackOverflowVQA dataset, which includes questions from StackOverflow that have one or more accompanying images. This is the first VQA dataset that focuses on software-related questions and contains multiple human-generated full-sentence answers. Additionally, we provide a baseline for answering the questions with respect to images in the introduced dataset using the GIT model. All versions of the dataset are available at this https URL.
- [154] arXiv:2405.10738 [pdf, ps, html, other]
-
Title: Feature-Adaptive and Data-Scalable In-Context LearningComments: Accepted at ACL 2024 main conferenceSubjects: Computation and Language (cs.CL)
In-context learning (ICL), which promotes inference with several demonstrations, has become a widespread paradigm to stimulate LLM capabilities for downstream tasks. Due to context length constraints, it cannot be further improved in spite of more training data, and general features directly from LLMs in ICL are not adaptive to the specific downstream task. In this paper, we propose a feature-adaptive and data-scalable in-context learning framework (FADS-ICL), which can leverage task-adaptive features to promote inference on the downstream task, with the supervision of beyond-context samples. Specifically, it first extracts general features of beyond-context samples via the LLM with ICL input form one by one, and introduces a task-specific modulator to perform feature refinement and prediction after fitting a specific downstream task. We conduct extensive experiments on FADS-ICL under varying data settings (4$\sim$128 shots) and LLM scale (0.8$\sim$70B) settings. Experimental results show that FADS-ICL consistently outperforms previous state-of-the-art methods by a significant margin under all settings, verifying the effectiveness and superiority of FADS-ICL. For example, under the 1.5B and 32 shots setting, FADS-ICL can achieve \textbf{+14.3} average accuracy from feature adaptation over vanilla ICL on 10 datasets, with \textbf{+6.2} average accuracy over the previous state-of-the-art method, and the performance can further improve with increasing training data. Code and data are publicly available at \url{this https URL}.
- [155] arXiv:2405.10739 [pdf, ps, html, other]
-
Title: Efficient Multimodal Large Language Models: A SurveyYizhang Jin, Jian Li, Yexin Liu, Tianjun Gu, Kai Wu, Zhengkai Jiang, Muyang He, Bo Zhao, Xin Tan, Zhenye Gan, Yabiao Wang, Chengjie Wang, Lizhuang MaSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
In the past year, Multimodal Large Language Models (MLLMs) have demonstrated remarkable performance in tasks such as visual question answering, visual understanding and reasoning. However, the extensive model size and high training and inference costs have hindered the widespread application of MLLMs in academia and industry. Thus, studying efficient and lightweight MLLMs has enormous potential, especially in edge computing scenarios. In this survey, we provide a comprehensive and systematic review of the current state of efficient MLLMs. Specifically, we summarize the timeline of representative efficient MLLMs, research state of efficient structures and strategies, and the applications. Finally, we discuss the limitations of current efficient MLLM research and promising future directions. Please refer to our GitHub repository for more details: this https URL.
- [156] arXiv:2405.10741 [pdf, ps, html, other]
-
Title: SBAAM! Eliminating Transcript Dependency in Automatic SubtitlingComments: Accepted to ACL 2024 main conferenceSubjects: Computation and Language (cs.CL)
Subtitling plays a crucial role in enhancing the accessibility of audiovisual content and encompasses three primary subtasks: translating spoken dialogue, segmenting translations into concise textual units, and estimating timestamps that govern their on-screen duration. Past attempts to automate this process rely, to varying degrees, on automatic transcripts, employed diversely for the three subtasks. In response to the acknowledged limitations associated with this reliance on transcripts, recent research has shifted towards transcription-free solutions for translation and segmentation, leaving the direct generation of timestamps as uncharted territory. To fill this gap, we introduce the first direct model capable of producing automatic subtitles, entirely eliminating any dependence on intermediate transcripts also for timestamp prediction. Experimental results, backed by manual evaluation, showcase our solution's new state-of-the-art performance across multiple language pairs and diverse conditions.
- [157] arXiv:2405.10743 [pdf, ps, html, other]
-
Title: Occupancy-SLAM: Simultaneously Optimizing Robot Poses and Continuous Occupancy MapComments: This paper has been accpeted by Robotics: Science and Systems 2022Journal-ref: Robotics: Science and Systems 2022Subjects: Robotics (cs.RO)
In this paper, we propose an optimization based SLAM approach to simultaneously optimize the robot trajectory and the occupancy map using 2D laser scans (and odometry) information. The key novelty is that the robot poses and the occupancy map are optimized together, which is significantly different from existing occupancy mapping strategies where the robot poses need to be obtained first before the map can be estimated. In our formulation, the map is represented as a continuous occupancy map where each 2D point in the environment has a corresponding evidence value. The Occupancy-SLAM problem is formulated as an optimization problem where the variables include all the robot poses and the occupancy values at the selected discrete grid cell nodes. We propose a variation of Gauss-Newton method to solve this new formulated problem, obtaining the optimized occupancy map and robot trajectory together with their uncertainties. Our algorithm is an offline approach since it is based on batch optimization and the number of variables involved is large. Evaluations using simulations and publicly available practical 2D laser datasets demonstrate that the proposed approach can estimate the maps and robot trajectories more accurately than the state-of-the-art techniques, when a relatively accurate initial guess is provided to our algorithm. The video shows the convergence process of the proposed Occupancy-SLAM and comparison of results to Cartographer can be found at \url{this https URL}.
- [158] arXiv:2405.10745 [pdf, ps, html, other]
-
Title: Empowering Small-Scale Knowledge Graphs: A Strategy of Leveraging General-Purpose Knowledge Graphs for Enriched EmbeddingsComments: Accepted for LREC-COLING 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Knowledge-intensive tasks pose a significant challenge for Machine Learning (ML) techniques. Commonly adopted methods, such as Large Language Models (LLMs), often exhibit limitations when applied to such tasks. Nevertheless, there have been notable endeavours to mitigate these challenges, with a significant emphasis on augmenting LLMs through Knowledge Graphs (KGs). While KGs provide many advantages for representing knowledge, their development costs can deter extensive research and applications. Addressing this limitation, we introduce a framework for enriching embeddings of small-scale domain-specific Knowledge Graphs with well-established general-purpose KGs. Adopting our method, a modest domain-specific KG can benefit from a performance boost in downstream tasks when linked to a substantial general-purpose KG. Experimental evaluations demonstrate a notable enhancement, with up to a 44% increase observed in the Hits@10 metric. This relatively unexplored research direction can catalyze more frequent incorporation of KGs in knowledge-intensive tasks, resulting in more robust, reliable ML implementations, which hallucinates less than prevalent LLM solutions.
Keywords: knowledge graph, knowledge graph completion, entity alignment, representation learning, machine learning - [159] arXiv:2405.10746 [pdf, ps, html, other]
-
Title: Causality in the Can: Diet Coke's Impact on FatnessComments: 9 pages, 4 figures, uses aaai24.stySubjects: Computers and Society (cs.CY); Artificial Intelligence (cs.AI)
Artificially sweetened beverages like Diet Coke are often considered healthier alternatives, but the debate over their impact on obesity persists. Previous research has predominantly relied on observational data or randomized controlled trials (RCTs), which may not accurately capture the causal relationship between Diet Coke consumption and obesity. This study uses causal inference methods, employing data from the National Health and Nutrition Examination Survey (NHANES) to examine this relationship across diverse demographics. Instead of relying on RCT data, we constructed a causal graph and applied the back-door criterion with its adjustment formula to estimate the RCT distributions. We then calculated the counterfactual quantity, the Probability of Necessity and Sufficiency (PNS), using both NHANES data and estimated RCT data. We propose that PNS is the essential metric for assessing the impact of Diet Coke on obesity. Our results indicate that between 20% to 50% of individuals, especially those with poor dietary habits, are more likely to gain weight from Diet Coke. Conversely, in groups like young females with healthier diets, only a small proportion experience weight gain due to Diet Coke. These findings highlight the influence of individual lifestyle and potential hormonal factors on the varied effects of Diet Coke, providing a new framework for understanding its nutritional impacts on health.
- [160] arXiv:2405.10748 [pdf, ps, html, other]
-
Title: Deep Data Consistency: a Fast and Robust Diffusion Model-based Solver for Inverse ProblemsComments: Codes: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Diffusion models have become a successful approach for solving various image inverse problems by providing a powerful diffusion prior. Many studies tried to combine the measurement into diffusion by score function replacement, matrix decomposition, or optimization algorithms, but it is hard to balance the data consistency and realness. The slow sampling speed is also a main obstacle to its wide application. To address the challenges, we propose Deep Data Consistency (DDC) to update the data consistency step with a deep learning model when solving inverse problems with diffusion models. By analyzing existing methods, the variational bound training objective is used to maximize the conditional posterior and reduce its impact on the diffusion process. In comparison with state-of-the-art methods in linear and non-linear tasks, DDC demonstrates its outstanding performance of both similarity and realness metrics in generating high-quality solutions with only 5 inference steps in 0.77 seconds on average. In addition, the robustness of DDC is well illustrated in the experiments across datasets, with large noise and the capacity to solve multiple tasks in only one pre-trained model.
- [161] arXiv:2405.10750 [pdf, ps, html, other]
-
Title: Parameter Identification for Electrochemical Models of Lithium-Ion Batteries Using Bayesian OptimizationComments: 6 pagesSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Efficient parameter identification of electrochemical models is crucial for accurate monitoring and control of lithium-ion cells. This process becomes challenging when applied to complex models that rely on a considerable number of interdependent parameters that affect the output response. Gradient-based and metaheuristic optimization techniques, although previously employed for this task, are limited by their lack of robustness, high computational costs, and susceptibility to local minima. In this study, Bayesian Optimization is used for tuning the dynamic parameters of an electrochemical equivalent circuit battery model (E-ECM) for a nickel-manganese-cobalt (NMC)-graphite cell. The performance of the Bayesian Optimization is compared with baseline methods based on gradient-based and metaheuristic approaches. The robustness of the parameter optimization method is tested by performing verification using an experimental drive cycle. The results indicate that Bayesian Optimization outperforms Gradient Descent and PSO optimization techniques, achieving reductions on average testing loss by 28.8% and 5.8%, respectively. Moreover, Bayesian optimization significantly reduces the variance in testing loss by 95.8% and 72.7%, respectively.
- [162] arXiv:2405.10751 [pdf, ps, html, other]
-
Title: Some remarks on a mathematical model for water flow in porous media with competition between transport and diffusionComments: 7 pagesSubjects: Numerical Analysis (math.NA); Analysis of PDEs (math.AP)
The contribution deals with the mathematical modelling of fluid flow in porous media, in particular water flow in soils. The motivation is to describe the competition between gravity and capillarity, or, in other words, between transport and diffusion. The analysis is based on a mathematical model developed by B. Detmann, C. Gavioli, and P. Krejčí, in which the effects of gravity are included in a novel way. The model consists of a nonlinear partial differential equation describing both the gravitational transport and the capillary diffusion of water. Although analytical solutions can be obtained for some special cases, only numerical solutions are available in more general situations. The solving algorithm is based on a time discretisation and the finite element method, and is written in Matlab. The results of the numerical simulations are shown and the behaviour of the model is discussed.
- [163] arXiv:2405.10757 [pdf, ps, html, other]
-
Title: Rethinking Graph Backdoor Attacks: A Distribution-Preserving PerspectiveSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Graph Neural Networks (GNNs) have shown remarkable performance in various tasks. However, recent works reveal that GNNs are vulnerable to backdoor attacks. Generally, backdoor attack poisons the graph by attaching backdoor triggers and the target class label to a set of nodes in the training graph. A GNN trained on the poisoned graph will then be misled to predict test nodes attached with trigger to the target class. Despite their effectiveness, our empirical analysis shows that triggers generated by existing methods tend to be out-of-distribution (OOD), which significantly differ from the clean data. Hence, these injected triggers can be easily detected and pruned with widely used outlier detection methods in real-world applications. Therefore, in this paper, we study a novel problem of unnoticeable graph backdoor attacks with in-distribution (ID) triggers. To generate ID triggers, we introduce an OOD detector in conjunction with an adversarial learning strategy to generate the attributes of the triggers within distribution. To ensure a high attack success rate with ID triggers, we introduce novel modules designed to enhance trigger memorization by the victim model trained on poisoned graph. Extensive experiments on real-world datasets demonstrate the effectiveness of the proposed method in generating in distribution triggers that can by-pass various defense strategies while maintaining a high attack success rate.
- [164] arXiv:2405.10758 [pdf, ps, html, other]
-
Title: Seeing is (Not) Believing: Practical Phishing Attacks Targeting Social Media Sharing CardsSubjects: Cryptography and Security (cs.CR)
In the digital era, Online Social Networks (OSNs) play a crucial role in information dissemination, with sharing cards for link previews emerging as a key feature. These cards offer snapshots of shared content, including titles, descriptions, and images. In this study, we investigate the construction and dissemination mechanisms of these cards, focusing on two primary server-side generation methods based on Share-SDK and HTML meta tags. Our investigation reveals a novel type of attack, i.e., Sharing Card Forgery (SCF) attack that can be exploited to create forged benign sharing cards for malicious links. We demonstrate the feasibility of these attacks through practical implementations and evaluate their effectiveness across 13 various online social networks. Our findings indicate a significant risk, as the deceptive cards can evade detection and persist on social platforms, thus posing a substantial threat to user security. We also delve into countermeasures and discuss the challenges in effectively mitigating these types of attacks. This study not only sheds light on a novel phishing technique but also calls for heightened awareness and improved defensive strategies in the OSN ecosystem.
- [165] arXiv:2405.10765 [pdf, ps, html, other]
-
Title: Fast Collision Probability Estimation for Automated Driving using Multi-circular Shape ApproximationsComments: Accepted for the 2024 Intelligent Vehicles Symposium, 8 pagesSubjects: Robotics (cs.RO); Probability (math.PR)
Many state-of-the-art methods for safety assessment and motion planning for automated driving require estimation of the probability of collision (POC). To estimate the POC, a shape approximation of the colliding actors and probability density functions of the associated uncertain kinematic variables are required. Even with such information available, the derivation of the POC is in general, i.e., for any shape and density, only possible with Monte Carlo sampling (MCS). Random sampling of the POC, however, is challenging as computational resources are limited in real-world applications. We present expressions for the POC in the presence of Gaussian uncertainties, based on multi-circular shape approximations. In addition, we show that the proposed approach is computationally more efficient than MCS. Lastly, we provide a method for upper and lower bounding the estimation error for the POC induced by the used shape approximations.
- [166] arXiv:2405.10767 [pdf, ps, html, other]
-
Title: Evaluating Saliency Explanations in NLP by CrowdsourcingComments: 13 pages, 4 figures, Accepted for LREC-Coling 2024 (Oral)Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
Deep learning models have performed well on many NLP tasks. However, their internal mechanisms are typically difficult for humans to understand. The development of methods to explain models has become a key issue in the reliability of deep learning models in many important applications. Various saliency explanation methods, which give each feature of input a score proportional to the contribution of output, have been proposed to determine the part of the input which a model values most. Despite a considerable body of work on the evaluation of saliency methods, whether the results of various evaluation metrics agree with human cognition remains an open question. In this study, we propose a new human-based method to evaluate saliency methods in NLP by crowdsourcing. We recruited 800 crowd workers and empirically evaluated seven saliency methods on two datasets with the proposed method. We analyzed the performance of saliency methods, compared our results with existing automated evaluation methods, and identified notable differences between NLP and computer vision (CV) fields when using saliency methods. The instance-level data of our crowdsourced experiments and the code to reproduce the explanations are available at this https URL.
- [167] arXiv:2405.10768 [pdf, ps, html, other]
-
Title: What should be observed for optimal reward in POMDPs?Subjects: Artificial Intelligence (cs.AI)
Partially observable Markov Decision Processes (POMDPs) are a standard model for agents making decisions in uncertain environments. Most work on POMDPs focuses on synthesizing strategies based on the available capabilities. However, system designers can often control an agent's observation capabilities, e.g. by placing or selecting sensors. This raises the question of how one should select an agent's sensors cost-effectively such that it achieves the desired goals. In this paper, we study the novel optimal observability problem OOP: Given a POMDP M, how should one change M's observation capabilities within a fixed budget such that its (minimal) expected reward remains below a given threshold? We show that the problem is undecidable in general and decidable when considering positional strategies only. We present two algorithms for a decidable fragment of the OOP: one based on optimal strategies of M's underlying Markov decision process and one based on parameter synthesis with SMT. We report promising results for variants of typical examples from the POMDP literature.
- [168] arXiv:2405.10774 [pdf, ps, other]
-
Title: Injective hardness condition for PCSPsComments: To be published in LICS'24Subjects: Computational Complexity (cs.CC)
We present a template for the Promise Constraint Satisfaction Problem (PCSP) which is NP-hard but does not satisfy the current state-of-the-art hardness condition [ACMTCT'21]. We introduce a new "injective" condition based on the smooth version of the layered PCP Theorem and use this new condition to confirm that the problem is indeed NP-hard. In the second part of the article, we establish a dichotomy for Boolean PCSPs defined by templates with polymorphisms in the set of linear threshold functions. The reasoning relies on the new injective condition.
- [169] arXiv:2405.10779 [pdf, ps, html, other]
-
Title: Baseline Results for Selected Nonlinear System Identification BenchmarksMax D. Champneys, Gerben I. Beintema, Roland Tóth, Maarten Schoukens, Maarten Schoukens, Timothy J. RogersSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Nonlinear system identification remains an important open challenge across research and academia. Large numbers of novel approaches are seen published each year, each presenting improvements or extensions to existing methods. It is natural, therefore, to consider how one might choose between these competing models. Benchmark datasets provide one clear way to approach this question. However, to make meaningful inference based on benchmark performance it is important to understand how well a new method performs comparatively to results available with well-established methods. This paper presents a set of ten baseline techniques and their relative performances on five popular benchmarks. The aim of this contribution is to stimulate thought and discussion regarding objective comparison of identification methodologies.
- [170] arXiv:2405.10787 [pdf, ps, html, other]
-
Title: On the Application of Reliability Theory to Cellular Network Mobility Performance AnalysisComments: 6 pages, 4 figures. Submitted to IEEE Wireless Communication Letters for possible publicationSubjects: Networking and Internet Architecture (cs.NI)
Achieving connectivity reliability is one of the significant challenges for 5G and beyond 5G cellular networks. The present understanding of reliability in the context of mobile communication does not adequately cover the stochastic temporal aspects of the network, such as the duration and spread of packet errors that an outage session may cause. Rather, it simply confines the definition to the percentage of successful packet delivery. In this letter, we offer an elaborate modeling of the outage for a cellular mobile network by showcasing the different types of outages and their contiguity characteristic. Thereafter, using the outage metrics, we define two new key performance indicators (KPIs), namely mean outage time and mean time between outages as counterparts to akin KPIs that already exist in classical reliability theory, i.e., mean down time and mean time between failures. Using a system-level simulation where user mobility is a crucial component, it is shown that these newly defined KPIs can be used to quantify the reliability requirements of different user applications in cellular services.
- [171] arXiv:2405.10788 [pdf, ps, html, other]
-
Title: QEdgeProxy: QoS-Aware Load Balancing for IoT Services in the Computing ContinuumComments: 7 pages, accepted for publication at IEEE EDGE 2024Subjects: Distributed, Parallel, and Cluster Computing (cs.DC)
While various service orchestration aspects within Computing Continuum (CC) systems have been extensively addressed, including service placement, replication, and scheduling, an open challenge lies in ensuring uninterrupted data delivery from IoT devices to running service instances in this dynamic environment, while adhering to specific Quality of Service (QoS) requirements and balancing the load on service instances. To address this challenge, we introduce QEdgeProxy, an adaptive and QoS-aware load balancing framework specifically designed for routing client requests to appropriate IoT service instances in the CC. QEdgeProxy integrates naturally within Kubernetes, adapts to changes in dynamic environments, and manages to seamlessly deliver data to IoT service instances while consistently meeting QoS requirements and effectively distributing load across them. This is verified by extensive experiments over a realistic K3s cluster with instance failures and network variability, where QEdgeProxy outperforms both Kubernetes built-in mechanisms and a state-of-the-art solution, while introducing minimal computational overhead.
- [172] arXiv:2405.10793 [pdf, ps, other]
-
Title: CCTNet: A Circular Convolutional Transformer Network for LiDAR-based Place Recognition Handling Movable Objects OcclusionSubjects: Robotics (cs.RO)
Place recognition is a fundamental task for robotic application, allowing robots to perform loop closure detection within simultaneous localization and mapping (SLAM), and achieve relocalization on prior maps. Current range image-based networks use single-column convolution to maintain feature invariance to shifts in image columns caused by LiDAR viewpoint change.However, this raises the issues such as "restricted receptive fields" and "excessive focus on local regions", degrading the performance of networks. To address the aforementioned issues, we propose a lightweight circular convolutional Transformer network denoted as CCTNet, which boosts performance by capturing structural information in point clouds and facilitating crossdimensional interaction of spatial and channel information. Initially, a Circular Convolution Module (CCM) is introduced, expanding the network's perceptual field while maintaining feature consistency across varying LiDAR perspectives. Then, a Range Transformer Module (RTM) is proposed, which enhances place recognition accuracy in scenarios with movable objects by employing a combination of channel and spatial attention mechanisms. Furthermore, we propose an Overlap-based loss function, transforming the place recognition task from a binary loop closure classification into a regression problem linked to the overlap between LiDAR frames. Through extensive experiments on the KITTI and Ford Campus datasets, CCTNet surpasses comparable methods, achieving Recall@1 of 0.924 and 0.965, and Recall@1% of 0.990 and 0.993 on the test set, showcasing a superior performance. Results on the selfcollected dataset further demonstrate the proposed method's potential for practical implementation in complex scenarios to handle movable objects, showing improved generalization in various datasets.
- [173] arXiv:2405.10799 [pdf, ps, other]
-
Title: Training Compute Thresholds: Features and Functions in AI GovernanceComments: Working paperSubjects: Computers and Society (cs.CY); Machine Learning (cs.LG)
This paper examines the use of training compute thresholds as a tool for governing artificial intelligence (AI) systems. We argue that compute thresholds serve as a valuable trigger for further evaluation of AI models, rather than being the sole determinant of the regulation. Key advantages of compute thresholds include their correlation with model capabilities and risks, quantifiability, ease of measurement, robustness to circumvention, knowability before model development and deployment, potential for external verification, and targeted scope. Compute thresholds provide a practical starting point for identifying potentially high-risk models and can be used as an initial filter in AI governance frameworks alongside other sector-specific regulations and broader governance measures.
- [174] arXiv:2405.10800 [pdf, ps, html, other]
-
Title: Heterogeneity-Informed Meta-Parameter Learning for Spatiotemporal Time Series ForecastingComments: Accepted by KDD'24 Research TrackSubjects: Machine Learning (cs.LG)
Spatiotemporal time series forecasting plays a key role in a wide range of real-world applications. While significant progress has been made in this area, fully capturing and leveraging spatiotemporal heterogeneity remains a fundamental challenge. Therefore, we propose a novel Heterogeneity-Informed Meta-Parameter Learning scheme. Specifically, our approach implicitly captures spatiotemporal heterogeneity through learning spatial and temporal embeddings, which can be viewed as a clustering process. Then, a novel spatiotemporal meta-parameter learning paradigm is proposed to learn spatiotemporal-specific parameters from meta-parameter pools, which is informed by the captured heterogeneity. Based on these ideas, we develop a Heterogeneity-Informed Spatiotemporal Meta-Network (HimNet) for spatiotemporal time series forecasting. Extensive experiments on five widely-used benchmarks demonstrate our method achieves state-of-the-art performance while exhibiting superior interpretability. Our code is available at this https URL.
- [175] arXiv:2405.10801 [pdf, ps, other]
-
Title: The Relational Machine CalculusComments: LICS paper, 15 pages excluding referencesSubjects: Programming Languages (cs.PL)
This paper presents the Relational Machine Calculus (RMC): a simple, foundational model of first-order relational programming. The RMC originates from the Functional Machine Calculus (FMC), which generalizes the lambda-calculus and its standard call-by-name stack machine in two directions. One, "locations", introduces multiple stacks, which enable effect operators to be encoded into the abstraction and application constructs. The second, "sequencing", introduces the imperative notions of "skip" and "sequence", similar to kappa-calculus and concatenative programming languages. The key observation of the RMC is that the first-order fragment of the FMC exhibits a latent duality which, given a simple decomposition of the relevant constructors, can be concretely expressed as an involution on syntax. Semantically, this gives rise to a sound and complete calculus for string diagrams of Frobenius monoids. We consider unification as the corresponding symmetric generalization of beta-reduction. By further including standard operators of Kleene algebra, the RMC embeds a range of computational models: the kappa-calculus, logic programming, automata, Interaction Nets, and Petri Nets, among others. These embeddings preserve operational semantics, which for the RMC is again given by a generalization of the standard stack machine for the lambda-calculus. The equational theory of the RMC (which supports reasoning about its operational semantics) is conservative over both the first-order lambda-calculus and Kleene algebra, and can be oriented to give a confluent reduction relation.
- [176] arXiv:2405.10802 [pdf, ps, html, other]
-
Title: Reduced storage direct tensor ring decomposition for convolutional neural networks compressionSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Convolutional neural networks (CNNs) are among the most widely used machine learning models for computer vision tasks, such as image classification. To improve the efficiency of CNNs, many CNNs compressing approaches have been developed. Low-rank methods approximate the original convolutional kernel with a sequence of smaller convolutional kernels, which leads to reduced storage and time complexities. In this study, we propose a novel low-rank CNNs compression method that is based on reduced storage direct tensor ring decomposition (RSDTR). The proposed method offers a higher circular mode permutation flexibility, and it is characterized by large parameter and FLOPS compression rates, while preserving a good classification accuracy of the compressed network. The experiments, performed on the CIFAR-10 and ImageNet datasets, clearly demonstrate the efficiency of RSDTR in comparison to other state-of-the-art CNNs compression approaches.
- [177] arXiv:2405.10808 [pdf, ps, html, other]
-
Title: ActiveLLM: Large Language Model-based Active Learning for Textual Few-Shot ScenariosComments: 18 pages, 7 figures, 4 tablesSubjects: Computation and Language (cs.CL)
Active learning is designed to minimize annotation efforts by prioritizing instances that most enhance learning. However, many active learning strategies struggle with a 'cold start' problem, needing substantial initial data to be effective. This limitation often reduces their utility for pre-trained models, which already perform well in few-shot scenarios. To address this, we introduce ActiveLLM, a novel active learning approach that leverages large language models such as GPT-4, Llama 3, and Mistral Large for selecting instances. We demonstrate that ActiveLLM significantly enhances the classification performance of BERT classifiers in few-shot scenarios, outperforming both traditional active learning methods and the few-shot learning method SetFit. Additionally, ActiveLLM can be extended to non-few-shot scenarios, allowing for iterative selections. In this way, ActiveLLM can even help other active learning strategies to overcome their cold start problem. Our results suggest that ActiveLLM offers a promising solution for improving model performance across various learning setups.
- [178] arXiv:2405.10814 [pdf, ps, html, other]
-
Title: Data-Driven Symbol Detection for Intersymbol Interference Channels with Bursty Impulsive NoiseComments: This work has been submitted to the IEEE for possible publicationSubjects: Information Theory (cs.IT); Machine Learning (cs.LG); Signal Processing (eess.SP)
We developed machine learning approaches for data-driven trellis-based soft symbol detection in coded transmission over intersymbol interference (ISI) channels in presence of bursty impulsive noise (IN), for example encountered in wireless digital broadcasting systems and vehicular communications. This enabled us to obtain optimized detectors based on the Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm while circumventing the use of full channel state information (CSI) for computing likelihoods and trellis state transition probabilities. First, we extended the application of the neural network (NN)-aided BCJR, recently proposed for ISI channels with additive white Gaussian noise (AWGN). Although suitable for estimating likelihoods via labeling of transmission sequences, the BCJR-NN method does not provide a framework for learning the trellis state transitions. In addition to detection over the joint ISI and IN states we also focused on another scenario where trellis transitions are not trivial: detection for the ISI channel with AWGN with inaccurate knowledge of the channel memory at the receiver. Without access to the accurate state transition matrix, the BCJR- NN performance significantly degrades in both settings. To this end, we devised an alternative approach for data-driven BCJR detection based on the unsupervised learning of a hidden Markov model (HMM). The BCJR-HMM allowed us to optimize both the likelihood function and the state transition matrix without labeling. Moreover, we demonstrated the viability of a hybrid NN and HMM BCJR detection where NN is used for learning the likelihoods, while the state transitions are optimized via HMM. While reducing the required prior channel knowledge, the examined data-driven detectors with learned trellis state transitions achieve bit error rates close to the optimal full CSI-based BCJR, significantly outperforming detection with inaccurate CSI.
- [179] arXiv:2405.10818 [pdf, ps, other]
-
Title: Modeling Supply Chain Interaction and Disruption: Insights from Real-world Data and Complex Adaptive SystemComments: arXiv admin note: text overlap with arXiv:2304.10428 by other authorsSubjects: Social and Information Networks (cs.SI)
In the rapidly evolving automotive industry, Systems-on-Chips (SoCs) are playing an increasingly crucial role in enhancing vehicle intelligence, connectivity, and safety features. For enterprises whose business encompasses automotive SoCs, the sustained and stable provision and receipt of SoC relevant goods or services are essential. Considering the imperative for a resilient and adaptable supply network, enterprises are concentrating their efforts on formulating strategies to address risks stemming from supply chain disruptions caused by technological obsolescence, natural disasters, and geopolitical tensions. This study presents an open supply knowledge extraction and complement approach and build a supply chain network of automotive SoC enterprises in China, which incorporates cross-domain named entity recognition under limited information, fuzzy matching of firm entities, and supply relation inferring based on knowledge graph. Subsequently, we exhibit the degree and registered capital distribution across firms, and analyze the correlations between centrality metrics in the supply chain network. Finally, based on recovery capacity and risk transfer, two interaction disruption models (IDMs) are developed to elucidate the adaptive behaviors and effect of network disruptions under various business and attack strategies. This research not only aids in exploring the complexities of Chinese automotive SoC supply chain but also enriches our understanding of the dynamics of firm behavior in this crucial industry sector.
- [180] arXiv:2405.10822 [pdf, ps, html, other]
-
Title: Generative modeling through internal high-dimensional chaotic activitySubjects: Machine Learning (cs.LG); Disordered Systems and Neural Networks (cond-mat.dis-nn)
Generative modeling aims at producing new datapoints whose statistical properties resemble the ones in a training dataset. In recent years, there has been a burst of machine learning techniques and settings that can achieve this goal with remarkable performances. In most of these settings, one uses the training dataset in conjunction with noise, which is added as a source of statistical variability and is essential for the generative task. Here, we explore the idea of using internal chaotic dynamics in high-dimensional chaotic systems as a way to generate new datapoints from a training dataset. We show that simple learning rules can achieve this goal within a set of vanilla architectures and characterize the quality of the generated datapoints through standard accuracy measures.
- [181] arXiv:2405.10824 [pdf, ps, other]
-
Title: Real-World Graph Analysis: Techniques for Static, Dynamic, and Temporal CommunitiesSubjects: Data Structures and Algorithms (cs.DS)
Graphs are widely used in various fields of computer science. They have also found application in unrelated areas, leading to a diverse range of problems. These problems can be modeled as relationships between entities in various contexts, such as social networks, protein interactions in cells, and route maps. Therefore it is logical to analyze these data structures with diverse approaches, whether they are numerical or structural, global or local, approximate or exact. In particular, the concept of community plays an important role in local structural analysis, as it is able to highlight the composition of the underlying graph while providing insights into what the organization and importance of the nodes in a network look like. This thesis pursues the goal of extracting knowledge from different kinds of graphs, including static, dynamic, and temporal graphs, with a particular focus on their community substructures. To tackle this task we use combinatorial algorithms that can list all the communities in a graph according to different formalizations, such as cliques, $k$-graphlets, and $k$-cores. We first develop new algorithms to enumerate subgraphs, using traditional and novel techniques such as push-out amortization, and CPU cache analysis to boost their efficiency. We then extend these concepts to the analysis of real-world graphs across diverse domains, ranging from social networks to autonomous systems modeled as temporal graphs. In this field, there is currently no widely accepted adaptation, even for straightforward subgraphs like $k$-cores, and the available data is expanding both in terms of quantity and scale. As a result, our findings advance the state of the art both from a theoretical and a practical perspective and can be used in a static or dynamic setting to further speed up and refine graph analysis techniques.
- [182] arXiv:2405.10825 [pdf, ps, html, other]
-
Title: Large Language Model (LLM) for Telecommunications: A Comprehensive Survey on Principles, Key Techniques, and OpportunitiesHao Zhou, Chengming Hu, Ye Yuan, Yufei Cui, Yili Jin, Can Chen, Haolun Wu, Dun Yuan, Li Jiang, Di Wu, Xue Liu, Charlie Zhang, Xianbin Wang, Jiangchuan LiuSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG)
Large language models (LLMs) have received considerable attention recently due to their outstanding comprehension and reasoning capabilities, leading to great progress in many fields. The advancement of LLM techniques also offers promising opportunities to automate many tasks in the telecommunication (telecom) field. After pre-training and fine-tuning, LLMs can perform diverse downstream tasks based on human instructions, paving the way to artificial general intelligence (AGI)-enabled 6G. Given the great potential of LLM technologies, this work aims to provide a comprehensive overview of LLM-enabled telecom networks. In particular, we first present LLM fundamentals, including model architecture, pre-training, fine-tuning, inference and utilization, model evaluation, and telecom deployment. Then, we introduce LLM-enabled key techniques and telecom applications in terms of generation, classification, optimization, and prediction problems. Specifically, the LLM-enabled generation applications include telecom domain knowledge, code, and network configuration generation. After that, the LLM-based classification applications involve network security, text, image, and traffic classification problems. Moreover, multiple LLM-enabled optimization techniques are introduced, such as automated reward function design for reinforcement learning and verbal reinforcement learning. Furthermore, for LLM-aided prediction problems, we discussed time-series prediction models and multi-modality prediction problems for telecom. Finally, we highlight the challenges and identify the future directions of LLM-enabled telecom networks.
- [183] arXiv:2405.10829 [pdf, ps, other]
-
Title: The participation of public in knowledge production: a citizen science projects overviewComments: 11 pages, 4 figures, 1 tableSubjects: Digital Libraries (cs.DL)
Citizen Science (CS) is related to public engagement in scientific research. The tasks in which the citizens can be involved are diverse and can range from data collection and tagging images to participation in the planning and research design. However, little is known about the involvement degree of the citizens to CS projects, and the contribution of those projects to the advancement of knowledge (e.g. scientific outcomes). This study aims to gain a better understanding by analysing the SciStarter database. A total of 2,346 CS projects were identified, mainly from Ecology and Environmental Sciences. Of these projects, 91% show low participation of the citizens (Level 1 "citizens as sensors" and 2 "citizens as interpreters", from Haklay's scale). In terms of scientific output, 918 papers indexed in the Web of Science (WoS) were identified. The most prolific projects were found to have lower levels of citizen involvement, specifically at Levels 1 and 2.
- [184] arXiv:2405.10830 [pdf, ps, html, other]
-
Title: Combining Teacher-Student with Representation Learning: A Concurrent Teacher-Student Reinforcement Learning Paradigm for Legged LocomotionComments: This paper presents a novel concurrent teacher-student reinforcement learning architecture for legged locomotion over challenging terrains, based only on proprioceptive measurements in real-world deployment. The effectiveness of the proposed architecture is demonstrated through extensive indoor and outdoor experiments on quadrupedal robots and a point-foot bipedal robotSubjects: Robotics (cs.RO)
Thanks to the explosive developments of data-driven learning methodologies recently, reinforcement learning (RL) emerges as a promising solution to address the legged locomotion problem in robotics. In this manuscript, we propose a novel concurrent teacher-student reinforcement learning architecture for legged locomotion over challenging terrains, based only on proprioceptive measurements in real-world deployment. Different from convectional teacher-student architecture that trains the teacher policy via RL and transfers the knowledge to the student policy through supervised learning, our proposed architecture trains teacher and student policy networks concurrently under the reinforcement learning paradigm. To achieve this, we develop a new training scheme based on conventional proximal policy gradient (PPO) method to accommodate the interaction between teacher policy network and student policy network. The effectiveness of the proposed architecture as well as the new training scheme is demonstrated through extensive indoor and outdoor experiments on quadrupedal robots and point-foot bipedal robot, showcasing robust locomotion over challenging terrains and improved performance compared to two-stage training methods.
- [185] arXiv:2405.10832 [pdf, ps, html, other]
-
Title: Open-Vocabulary Spatio-Temporal Action DetectionSubjects: Computer Vision and Pattern Recognition (cs.CV)
Spatio-temporal action detection (STAD) is an important fine-grained video understanding task. Current methods require box and label supervision for all action classes in advance. However, in real-world applications, it is very likely to come across new action classes not seen in training because the action category space is large and hard to enumerate. Also, the cost of data annotation and model training for new classes is extremely high for traditional methods, as we need to perform detailed box annotations and re-train the whole network from scratch. In this paper, we propose a new challenging setting by performing open-vocabulary STAD to better mimic the situation of action detection in an open world. Open-vocabulary spatio-temporal action detection (OV-STAD) requires training a model on a limited set of base classes with box and label supervision, which is expected to yield good generalization performance on novel action classes. For OV-STAD, we build two benchmarks based on the existing STAD datasets and propose a simple but effective method based on pretrained video-language models (VLM). To better adapt the holistic VLM for the fine-grained action detection task, we carefully fine-tune it on the localized video region-text pairs. This customized fine-tuning endows the VLM with better motion understanding, thus contributing to a more accurate alignment between video regions and texts. Local region feature and global video feature fusion before alignment is adopted to further improve the action detection performance by providing global context. Our method achieves a promising performance on novel classes.
- [186] arXiv:2405.10835 [pdf, ps, html, other]
-
Title: A Unified Search and Recommendation Framework Based on Multi-Scenario Learning for Ranking in E-commerceComments: Accepted by SIGIR 2024Subjects: Information Retrieval (cs.IR)
Search and recommendation (S&R) are the two most important scenarios in e-commerce. The majority of users typically interact with products in S&R scenarios, indicating the need and potential for joint modeling. Traditional multi-scenario models use shared parameters to learn the similarity of multiple tasks, and task-specific parameters to learn the divergence of individual tasks. This coarse-grained modeling approach does not effectively capture the differences between S&R scenarios. Furthermore, this approach does not sufficiently exploit the information across the global label space. These issues can result in the suboptimal performance of multi-scenario models in handling both S&R scenarios. To address these issues, we propose an effective and universal framework for Unified Search and Recommendation (USR), designed with S&R Views User Interest Extractor Layer (IE) and S&R Views Feature Generator Layer (FG) to separately generate user interests and scenario-agnostic feature representations for S&R. Next, we introduce a Global Label Space Multi-Task Layer (GLMT) that uses global labels as supervised signals of auxiliary tasks and jointly models the main task and auxiliary tasks using conditional probability. Extensive experimental evaluations on real-world industrial datasets show that USR can be applied to various multi-scenario models and significantly improve their performance. Online A/B testing also indicates substantial performance gains across multiple metrics. Currently, USR has been successfully deployed in the 7Fresh App.
- [187] arXiv:2405.10842 [pdf, ps, html, other]
-
Title: Automated Radiology Report Generation: A Review of Recent AdvancesComments: 24 pages, 8 figures, 6 tables. Submitted to IEEE Reviews in Biomedical EngineeringSubjects: Computer Vision and Pattern Recognition (cs.CV)
Increasing demands on medical imaging departments are taking a toll on the radiologist's ability to deliver timely and accurate reports. Recent technological advances in artificial intelligence have demonstrated great potential for automatic radiology report generation (ARRG), sparking an explosion of research. This survey paper conducts a methodological review of contemporary ARRG approaches by way of (i) assessing datasets based on characteristics, such as availability, size, and adoption rate, (ii) examining deep learning training methods, such as contrastive learning and reinforcement learning, (iii) exploring state-of-the-art model architectures, including variations of CNN and transformer models, (iv) outlining techniques integrating clinical knowledge through multimodal inputs and knowledge graphs, and (v) scrutinising current model evaluation techniques, including commonly applied NLP metrics and qualitative clinical reviews. Furthermore, the quantitative results of the reviewed models are analysed, where the top performing models are examined to seek further insights. Finally, potential new directions are highlighted, with the adoption of additional datasets from other radiological modalities and improved evaluation methods predicted as important areas of future development.
- [188] arXiv:2405.10845 [pdf, ps, html, other]
-
Title: Natural Language Processing for Requirements TraceabilityComments: Book Chapter in the Handbook of Natural Language Processing for Requirements EngineeringSubjects: Software Engineering (cs.SE)
Traceability, the ability to trace relevant software artifacts to support reasoning about the quality of the software and its development process, plays a crucial role in requirements and software engineering, particularly for safety-critical systems. In this chapter, we provide a comprehensive overview of the representative tasks in requirement traceability for which natural language processing (NLP) and related techniques have made considerable progress in the past decade. We first present the definition of traceability in the context of requirements and the overall engineering process, as well as other important concepts related to traceability tasks. Then, we discuss two tasks in detail, including trace link recovery and trace link maintenance. We also introduce two other related tasks concerning when trace links are used in practical contexts. For each task, we explain the characteristics of the task, how it can be approached through NLP techniques, and how to design and conduct the experiment to demonstrate the performance of the NLP techniques. We further discuss practical considerations on how to effectively apply NLP techniques and assess their effectiveness regarding the data set collection, the metrics selection, and the role of humans when evaluating the NLP approaches. Overall, this chapter prepares the readers with the fundamental knowledge of designing automated traceability solutions enabled by NLP in practice.
- [189] arXiv:2405.10847 [pdf, ps, html, other]
-
Title: Model Predictive Contouring Control for Vehicle Obstacle Avoidance at the Limit of Handling Using Torque VectoringAlberto Bertipaglia, Davide Tavernini, Umberto Montanaro, Mohsen Alirezaei, Riender Happee, Aldo Sorniotti, Barys ShyrokauComments: Accepted at IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Boston, USA, 2024Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This paper presents an original approach to vehicle obstacle avoidance. It involves the development of a nonlinear Model Predictive Contouring Control, which uses torque vectoring to stabilise and drive the vehicle in evasive manoeuvres at the limit of handling. The proposed algorithm combines motion planning, path tracking and vehicle stability objectives, prioritising collision avoidance in emergencies. The controller's prediction model is a nonlinear double-track vehicle model based on an extended Fiala tyre to capture the nonlinear coupled longitudinal and lateral dynamics. The controller computes the optimal steering angle and the longitudinal forces per each of the four wheels to minimise tracking error in safe situations and maximise the vehicle-to-obstacle distance in emergencies. Thanks to the optimisation of the longitudinal tyre forces, the proposed controller can produce an extra yaw moment, increasing the vehicle's lateral agility to avoid obstacles while keeping the vehicle stable. The optimal forces are constrained in the tyre friction circle not to exceed the tyres and vehicle capabilities. In a high-fidelity simulation environment, we demonstrate the benefits of torque vectoring, showing that our proposed approach is capable of successfully avoiding obstacles and keeping the vehicle stable while driving a double-lane change manoeuvre, in comparison to baselines lacking torque vectoring or collision avoidance prioritisation.
- [190] arXiv:2405.10849 [pdf, ps, html, other]
-
Title: Generative AI for Test Driven Development: Preliminary ResultsComments: Accepted to AI4ASE workshop at XP'24Subjects: Software Engineering (cs.SE)
Test Driven Development (TDD) is one of the major practices of Extreme Programming for which incremental testing and refactoring trigger the code development. TDD has limited adoption in the industry, as it requires more code to be developed and experienced developers. Generative AI (GenAI) may reduce the extra effort imposed by TDD. In this work, we introduce an approach to automatize TDD by embracing GenAI either in a collaborative interaction pattern in which developers create tests and supervise the AI generation during each iteration or a fully-automated pattern in which developers only supervise the AI generation at the end of the iterations. We run an exploratory experiment with ChatGPT in which the interaction patterns are compared with the non-AI TDD regarding test and code quality and development speed. Overall, we found that, for our experiment and settings, GenAI can be efficiently used in TDD, but it requires supervision of the quality of the produced code. In some cases, it can even mislead non-expert developers and propose solutions just for the sake of the query.
- [191] arXiv:2405.10852 [pdf, ps, html, other]
-
Title: KernelSHAP-IQ: Weighted Least-Square Optimization for Shapley InteractionsComments: Accepted Paper at ICML 2024. This version is not the Camera Ready VersionSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and $k$-Shapley values ($k$-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.
- [192] arXiv:2405.10853 [pdf, ps, html, other]
-
Title: The Future of Large Language Model Pre-training is FederatedLorenzo Sani, Alex Iacob, Zeyu Cao, Bill Marino, Yan Gao, Tomas Paulik, Wanru Zhao, William F. Shen, Preslav Aleksandrov, Xinchi Qiu, Nicholas D. LaneComments: 10 pages, 4 figures, pre-printSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Distributed, Parallel, and Cluster Computing (cs.DC)
Generative pre-trained large language models (LLMs) have demonstrated impressive performance over a wide range of tasks, thanks to the unprecedented amount of data they have been trained on. As established scaling laws indicate, LLMs' future performance improvement depends on the amount of computing and data sources we can leverage for pre-training. Federated learning (FL) has the potential to unleash the majority of the planet's data and computational resources, which are underutilized by the data-center-focused training methodology of current LLM practice. Our work presents a robust, flexible, reproducible FL approach that enables large-scale collaboration across institutions to train LLMs. This would mobilize more computational and data resources while matching or potentially exceeding centralized performance. We further show the effectiveness of the federated training scales with model size and present our approach for training a billion-scale federated LLM using limited resources. This will help data-rich actors to become the protagonists of LLMs pre-training instead of leaving the stage to compute-rich actors alone.
- [193] arXiv:2405.10859 [pdf, ps, html, other]
-
Title: A Nonlinear Model Predictive Control for Automated Drifting with a Standard Passenger VehicleComments: Accepted at IEEE/ASME International Conference on Advanced Intelligent Mechatronics, Boston, USA, 2024Subjects: Robotics (cs.RO); Systems and Control (eess.SY)
This paper presents a novel approach to automated drifting with a standard passenger vehicle, which involves a Nonlinear Model Predictive Control to stabilise and maintain the vehicle at high sideslip angle conditions. The proposed controller architecture is split into three components. The first part consists of the offline computed equilibrium maps, which provide the equilibrium points for each vehicle state given the desired sideslip angle and radius of the path. The second is the predictive controller minimising the errors between the equilibrium and actual vehicle states. The third is a path-following controller, which reduces the path error, altering the equilibrium curvature path. In a high-fidelity simulation environment, we validate the controller architecture capacity to stabilise the vehicle in automated drifting along a desired path, with a maximal lateral path deviation of 1 m. In the experiments with a standard passenger vehicle, we demonstrate that the proposed approach is capable of bringing and maintaining the vehicle at the desired 30 deg sideslip angle in both high and low friction conditions.
- [194] arXiv:2405.10860 [pdf, ps, html, other]
-
Title: ECR-Chain: Advancing Generative Language Models to Better Emotion-Cause Reasoners through Reasoning ChainsComments: Accepted by IJCAI 2024Subjects: Computation and Language (cs.CL)
Understanding the process of emotion generation is crucial for analyzing the causes behind emotions. Causal Emotion Entailment (CEE), an emotion-understanding task, aims to identify the causal utterances in a conversation that stimulate the emotions expressed in a target utterance. However, current works in CEE mainly focus on modeling semantic and emotional interactions in conversations, neglecting the exploration of the emotion-generation process. This hinders the models from deeply understanding emotions, restricting their ability to produce explainable predictions. In this work, inspired by the emotion generation process of "stimulus-appraisal-emotion" in the cognitive appraisal theory, we introduce a step-by-step reasoning method, Emotion-Cause Reasoning Chain (ECR-Chain), to infer the stimulus from the target emotional expressions in conversations. Specifically, we first introduce the ECR-Chain to ChatGPT via few-shot prompting, which significantly improves its performance on the CEE task. We further propose an automated construction process to utilize ChatGPT in building an ECR-Chain set, which can enhance the reasoning abilities of smaller models through supervised training and assist the Vicuna-7B model in achieving state-of-the-art CEE performance. Moreover, our methods can enable these generative language models to effectively perform emotion-cause reasoning in an explainable manner. Our code, data and more details are at this https URL.
- [195] arXiv:2405.10861 [pdf, ps, html, other]
-
Title: Tailoring Vaccine Messaging with Common-Ground OpinionsRickard Stureborg, Sanxing Chen, Ruoyu Xie, Aayushi Patel, Christopher Li, Chloe Qinyu Zhu, Tingnan Hu, Jun Yang, Bhuwan DhingraComments: NAACL Findings 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
One way to personalize chatbot interactions is by establishing common ground with the intended reader. A domain where establishing mutual understanding could be particularly impactful is vaccine concerns and misinformation. Vaccine interventions are forms of messaging which aim to answer concerns expressed about vaccination. Tailoring responses in this domain is difficult, since opinions often have seemingly little ideological overlap. We define the task of tailoring vaccine interventions to a Common-Ground Opinion (CGO). Tailoring responses to a CGO involves meaningfully improving the answer by relating it to an opinion or belief the reader holds. In this paper we introduce TAILOR-CGO, a dataset for evaluating how well responses are tailored to provided CGOs. We benchmark several major LLMs on this task; finding GPT-4-Turbo performs significantly better than others. We also build automatic evaluation metrics, including an efficient and accurate BERT model that outperforms finetuned LLMs, investigate how to successfully tailor vaccine messaging to CGOs, and provide actionable recommendations from this investigation.
Code and model weights: this https URL Dataset: this https URL - [196] arXiv:2405.10864 [pdf, ps, html, other]
-
Title: Improving face generation quality and prompt following with synthetic captionsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Recent advancements in text-to-image generation using diffusion models have significantly improved the quality of generated images and expanded the ability to depict a wide range of objects. However, ensuring that these models adhere closely to the text prompts remains a considerable challenge. This issue is particularly pronounced when trying to generate photorealistic images of humans. Without significant prompt engineering efforts models often produce unrealistic images and typically fail to incorporate the full extent of the prompt information. This limitation can be largely attributed to the nature of captions accompanying the images used in training large scale diffusion models, which typically prioritize contextual information over details related to the person's appearance. In this paper we address this issue by introducing a training-free pipeline designed to generate accurate appearance descriptions from images of people. We apply this method to create approximately 250,000 captions for publicly available face datasets. We then use these synthetic captions to fine-tune a text-to-image diffusion model. Our results demonstrate that this approach significantly improves the model's ability to generate high-quality, realistic human faces and enhances adherence to the given prompts, compared to the baseline model. We share our synthetic captions, pretrained checkpoints and training code.
- [197] arXiv:2405.10868 [pdf, ps, html, other]
-
Title: Air Signing and Privacy-Preserving Signature Verification for Digital DocumentsSubjects: Computer Vision and Pattern Recognition (cs.CV); Human-Computer Interaction (cs.HC)
This paper presents a novel approach to the digital signing of electronic documents through the use of a camera-based interaction system, single-finger tracking for sign recognition, and multi commands executing hand gestures. The proposed solution, referred to as "Air Signature," involves writing the signature in front of the camera, rather than relying on traditional methods such as mouse drawing or physically signing on paper and showing it to a web camera. The goal is to develop a state-of-the-art method for detecting and tracking gestures and objects in real-time. The proposed methods include applying existing gesture recognition and object tracking systems, improving accuracy through smoothing and line drawing, and maintaining continuity during fast finger movements. An evaluation of the fingertip detection, sketching, and overall signing process is performed to assess the effectiveness of the proposed solution. The secondary objective of this research is to develop a model that can effectively recognize the unique signature of a user. This type of signature can be verified by neural cores that analyze the movement, speed, and stroke pixels of the signing in real time. The neural cores use machine learning algorithms to match air signatures to the individual's stored signatures, providing a secure and efficient method of verification. Our proposed System does not require sensors or any hardware other than the camera.
- [198] arXiv:2405.10871 [pdf, ps, html, other]
-
Title: BraTS-Path Challenge: Assessing Heterogeneous Histopathologic Brain Tumor Sub-regionsSpyridon Bakas, Siddhesh P. Thakur, Shahriar Faghani, Mana Moassefi, Ujjwal Baid, Verena Chung, Sarthak Pati, Shubham Innani, Bhakti Baheti, Jake Albrecht, Alexandros Karargyris, Hasan Kassem, MacLean P. Nasrallah, Jared T. Ahrendsen, Valeria Barresi, Maria A. Gubbiotti, Giselle Y. López, Calixto-Hope G. Lucas, Michael L. Miller, Lee A. D. Cooper, Jason T. Huse, William R. BellSubjects: Computer Vision and Pattern Recognition (cs.CV)
Glioblastoma is the most common primary adult brain tumor, with a grim prognosis - median survival of 12-18 months following treatment, and 4 months otherwise. Glioblastoma is widely infiltrative in the cerebral hemispheres and well-defined by heterogeneous molecular and micro-environmental histopathologic profiles, which pose a major obstacle in treatment. Correctly diagnosing these tumors and assessing their heterogeneity is crucial for choosing the precise treatment and potentially enhancing patient survival rates. In the gold-standard histopathology-based approach to tumor diagnosis, detecting various morpho-pathological features of distinct histology throughout digitized tissue sections is crucial. Such "features" include the presence of cellular tumor, geographic necrosis, pseudopalisading necrosis, areas abundant in microvascular proliferation, infiltration into the cortex, wide extension in subcortical white matter, leptomeningeal infiltration, regions dense with macrophages, and the presence of perivascular or scattered lymphocytes. With these features in mind and building upon the main aim of the BraTS Cluster of Challenges this https URL, the goal of the BraTS-Path challenge is to provide a systematically prepared comprehensive dataset and a benchmarking environment to develop and fairly compare deep-learning models capable of identifying tumor sub-regions of distinct histologic profile. These models aim to further our understanding of the disease and assist in the diagnosis and grading of conditions in a consistent manner.
- [199] arXiv:2405.10874 [pdf, ps, html, other]
-
Title: Square-Root Inverse Filter-based GNSS-Visual-Inertial NavigationSubjects: Robotics (cs.RO)
While Global Navigation Satellite System (GNSS) is often used to provide global positioning if available, its intermittency and/or inaccuracy calls for fusion with other sensors. In this paper, we develop a novel GNSS-Visual-Inertial Navigation System (GVINS) that fuses visual, inertial, and raw GNSS measurements within the square-root inverse sliding window filtering (SRI-SWF) framework in a tightly coupled fashion, which thus is termed SRI-GVINS. In particular, for the first time, we deeply fuse the GNSS pseudorange, Doppler shift, single-differenced pseudorange, and double-differenced carrier phase measurements, along with the visual-inertial measurements. Inherited from the SRI-SWF, the proposed SRI-GVINS gains significant numerical stability and computational efficiency over the start-of-the-art methods. Additionally, we propose to use a filter to sequentially initialize the reference frame transformation till converges, rather than collecting measurements for batch optimization. We also perform online calibration of GNSS-IMU extrinsic parameters to mitigate the possible extrinsic parameter degradation. The proposed SRI-GVINS is extensively evaluated on our own collected UAV datasets and the results demonstrate that the proposed method is able to suppress VIO drift in real-time and also show the effectiveness of online GNSS-IMU extrinsic calibration. The experimental validation on the public datasets further reveals that the proposed SRI-GVINS outperforms the state-of-the-art methods in terms of both accuracy and efficiency.
- [200] arXiv:2405.10875 [pdf, ps, html, other]
-
Title: Recursively Feasible Shrinking-Horizon MPC in Dynamic Environments with Conformal Prediction GuaranteesSubjects: Systems and Control (eess.SY); Machine Learning (stat.ML)
In this paper, we focus on the problem of shrinking-horizon Model Predictive Control (MPC) in uncertain dynamic environments. We consider controlling a deterministic autonomous system that interacts with uncontrollable stochastic agents during its mission. Employing tools from conformal prediction, existing works derive high-confidence prediction regions for the unknown agent trajectories, and integrate these regions in the design of suitable safety constraints for MPC. Despite guaranteeing probabilistic safety of the closed-loop trajectories, these constraints do not ensure feasibility of the respective MPC schemes for the entire duration of the mission. We propose a shrinking-horizon MPC that guarantees recursive feasibility via a gradual relaxation of the safety constraints as new prediction regions become available online. This relaxation enforces the safety constraints to hold over the least restrictive prediction region from the set of all available prediction regions. In a comparative case study with the state of the art, we empirically show that our approach results in tighter prediction regions and verify recursive feasibility of our MPC scheme.
- [201] arXiv:2405.10877 [pdf, ps, html, other]
-
Title: WEITS: A Wavelet-enhanced residual framework for interpretable time series forecastingComments: arXiv admin note: text overlap with arXiv:2310.09488 by other authorsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Time series (TS) forecasting has been an unprecedentedly popular problem in recent years, with ubiquitous applications in both scientific and business fields. Various approaches have been introduced to time series analysis, including both statistical approaches and deep neural networks. Although neural network approaches have illustrated stronger ability of representation than statistical methods, they struggle to provide sufficient interpretablility, and can be too complicated to optimize. In this paper, we present WEITS, a frequency-aware deep learning framework that is highly interpretable and computationally efficient. Through multi-level wavelet decomposition, WEITS novelly infuses frequency analysis into a highly deep learning framework. Combined with a forward-backward residual architecture, it enjoys both high representation capability and statistical interpretability. Extensive experiments on real-world datasets have demonstrated competitive performance of our model, along with its additional advantage of high computation efficiency. Furthermore, WEITS provides a general framework that can always seamlessly integrate with state-of-the-art approaches for time series forecast.
- [202] arXiv:2405.10879 [pdf, ps, html, other]
-
Title: One registration is worth two segmentationsComments: Early Accepted by MICCAI2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
The goal of image registration is to establish spatial correspondence between two or more images, traditionally through dense displacement fields (DDFs) or parametric transformations (e.g., rigid, affine, and splines). Rethinking the existing paradigms of achieving alignment via spatial transformations, we uncover an alternative but more intuitive correspondence representation: a set of corresponding regions-of-interest (ROI) pairs, which we demonstrate to have sufficient representational capability as other correspondence representation methods.Further, it is neither necessary nor sufficient for these ROIs to hold specific anatomical or semantic significance. In turn, we formulate image registration as searching for the same set of corresponding ROIs from both moving and fixed images - in other words, two multi-class segmentation tasks on a pair of images. For a general-purpose and practical implementation, we integrate the segment anything model (SAM) into our proposed algorithms, resulting in a SAM-enabled registration (SAMReg) that does not require any training data, gradient-based fine-tuning or engineered prompts. We experimentally show that the proposed SAMReg is capable of segmenting and matching multiple ROI pairs, which establish sufficiently accurate correspondences, in three clinical applications of registering prostate MR, cardiac MR and abdominal CT images. Based on metrics including Dice and target registration errors on anatomical structures, the proposed registration outperforms both intensity-based iterative algorithms and DDF-predicting learning-based networks, even yielding competitive performance with weakly-supervised registration which requires fully-segmented training data.
- [203] arXiv:2405.10880 [pdf, ps, other]
-
Title: The MESA Security Model 2.0: A Dynamic Framework for Mitigating Stealth Data ExfiltrationJournal-ref: International Journal of Network Security & Its Applications (IJNSA) 2024Subjects: Cryptography and Security (cs.CR)
The rising complexity of cyber threats calls for a comprehensive reassessment of current security frameworks in business environments. This research focuses on Stealth Data Exfiltration, a significant cyber threat characterized by covert infiltration, extended undetectability, and unauthorized dissemination of confidential data. Our findings reveal that conventional defense-in-depth strategies often fall short in combating these sophisticated threats, highlighting the immediate need for a shift in information risk management across businesses. The evolving nature of cyber threats, driven by advancements in techniques such as social engineering, multi-vector attacks, and Generative AI, underscores the need for robust, adaptable, and comprehensive security strategies. As we navigate this complex landscape, it is crucial to anticipate potential threats and continually update our defenses. We propose a shift from traditional perimeter-based, prevention-focused models, which depend on a static attack surface, to a more dynamic framework that prepares for inevitable breaches. This suggested model, known as MESA 2.0 Security Model, prioritizes swift detection, immediate response, and ongoing resilience, thereby enhancing an organizations ability to promptly identify and neutralize threats, significantly reducing the consequences of security breaches. This study suggests that businesses adopt a forward-thinking and adaptable approach to security management to stay ahead of the ever-changing cyber threat landscape.
- [204] arXiv:2405.10883 [pdf, ps, other]
-
Title: Application of Artificial Intelligence in Schizophrenia Rehabilitation Management: Systematic Literature ReviewSubjects: Artificial Intelligence (cs.AI)
This review aims to systematically assess the current status and prospects of artificial intelligence (AI) in the rehabilitation management of patients with schizophrenia and their impact on the rehabilitation process. We selected 70 studies from 2012 to the present, focusing on application, technology categories, products, and data types of machine learning, deep learning, reinforcement learning, and other technologies in mental health interventions and management. The results indicate that AI can be widely used in symptom monitoring, relapse risk prediction, and rehabilitation treatment by analyzing ecological momentary assessment, behavioral, and speech data. This review further explores the potential challenges and future directions of emerging products, technologies, and analytical methods based on AI, such as social media analysis, serious games, and large language models in rehabilitation. In summary, this study systematically reviews the application status of AI in schizophrenia rehabilitation management and provides valuable insights and recommendations for future research paths.
- [205] arXiv:2405.10885 [pdf, ps, html, other]
-
Title: FA-Depth: Toward Fast and Accurate Self-supervised Monocular Depth EstimationSubjects: Computer Vision and Pattern Recognition (cs.CV)
Most existing methods often rely on complex models to predict scene depth with high accuracy, resulting in slow inference that is not conducive to deployment. To better balance precision and speed, we first designed SmallDepth based on sparsity. Second, to enhance the feature representation ability of SmallDepth during training under the condition of equal complexity during inference, we propose an equivalent transformation module(ETM). Third, to improve the ability of each layer in the case of a fixed SmallDepth to perceive different context information and improve the robustness of SmallDepth to the left-right direction and illumination changes, we propose pyramid loss. Fourth, to further improve the accuracy of SmallDepth, we utilized the proposed function approximation loss (APX) to transfer knowledge in the pretrained HQDecv2, obtained by optimizing the previous HQDec to address grid artifacts in some regions, to SmallDepth. Extensive experiments demonstrate that each proposed component improves the precision of SmallDepth without changing the complexity of SmallDepth during inference, and the developed approach achieves state-of-the-art results on KITTI at an inference speed of more than 500 frames per second and with approximately 2 M parameters. The code and models will be publicly available at this https URL.
- [206] arXiv:2405.10887 [pdf, ps, html, other]
-
Title: Preservation theorems on sparse classes revisitedComments: 16 pagesSubjects: Logic in Computer Science (cs.LO); Logic (math.LO)
We revisit the work studying homomorphism preservation for first-order logic in sparse classes of structures initiated in [Atserias et al., JACM 2006] and [Dawar, JCSS 2010]. These established that first-order logic has the homomorphism preservation property in any sparse class that is monotone and addable. It turns out that the assumption of addability is not strong enough for the proofs given. We demonstrate this by constructing classes of graphs of bounded treewidth which are monotone and addable but fail to have homomorphism preservation. We also show that homomorphism preservation fails on the class of planar graphs. On the other hand, the proofs of homomorphism preservation can be recovered by replacing addability by a stronger condition of amalgamation over bottlenecks. This is analogous to a similar condition formulated for extension preservation in [Ateserias et al., SiCOMP 2008].
- [207] arXiv:2405.10891 [pdf, ps, html, other]
-
Title: Prioritising GitHub Priority LabelsComments: 4 pages, 5 tables, 2 figures, appearing in PROMISE 2024Subjects: Software Engineering (cs.SE)
Communities on GitHub often use issue labels as a way of triaging issues by assigning them priority ratings based on how urgently they should be addressed. The labels used are determined by the repository contributors and not standardised by GitHub. This makes it difficult for priority-related reasoning across repositories for both researchers and contributors. Previous work shows interest in how issues are labelled and what the consequences for those labels are. For instance, some previous work has used clustering models and natural language processing to categorise labels without a particular emphasis on priority. With this publication, we introduce a unique data set of 812 manually categorised labels pertaining to priority; normalised and ranked as low-, medium-, or high-priority. To provide an example of how this data set could be used, we have created a tool for GitHub contributors that will create a list of the highest priority issues from the repositories to which they contribute. We have released the data set and the tool for anyone to use on Zenodo because we hope that this will help the open source community address high-priority issues more effectively and inspire other uses.
- [208] arXiv:2405.10892 [pdf, ps, html, other]
-
Title: Neuroscheduling for Remote EstimationComments: Submitted for presentation at the 2024 Asilomar Conference on Signals, Systems, and ComputersSubjects: Systems and Control (eess.SY)
Many modern distributed systems consist of devices that generate more data than what can be transmitted via a communication link in near real time with high-fidelity. We consider the scheduling problem in which a device has access to multiple data sources, but at any moment, only one of them is revealed in real-time to a remote receiver. Even when the sources are Gaussian, and the fidelity criterion is the mean squared error, the globally optimal data selection strategy is not known. We propose a data-driven methodology to search for the elusive optimal solution using linear function approximation approach called neuroscheduling and establish necessary and sufficient conditions for the optimal scheduler to not over fit training data. Additionally, we present several numerical results that show that the globally optimal scheduler and estimator pair to the Gaussian case are nonlinear.
- [209] arXiv:2405.10893 [pdf, ps, html, other]
-
Title: COGNET-MD, an evaluation framework and dataset for Large Language Model benchmarks in the medical domainDimitrios P. Panagoulias, Persephone Papatheodosiou, Anastasios P. Palamidas, Mattheos Sanoudos, Evridiki Tsoureli-Nikita, Maria Virvou, George A. TsihrintzisComments: Technical PaperSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Large Language Models (LLMs) constitute a breakthrough state-of-the-art Artificial Intelligence (AI) technology which is rapidly evolving and promises to aid in medical diagnosis either by assisting doctors or by simulating a doctor's workflow in more advanced and complex implementations. In this technical paper, we outline Cognitive Network Evaluation Toolkit for Medical Domains (COGNET-MD), which constitutes a novel benchmark for LLM evaluation in the medical domain. Specifically, we propose a scoring-framework with increased difficulty to assess the ability of LLMs in interpreting medical text. The proposed framework is accompanied with a database of Multiple Choice Quizzes (MCQs). To ensure alignment with current medical trends and enhance safety, usefulness, and applicability, these MCQs have been constructed in collaboration with several associated medical experts in various medical domains and are characterized by varying degrees of difficulty. The current (first) version of the database includes the medical domains of Psychiatry, Dentistry, Pulmonology, Dermatology and Endocrinology, but it will be continuously extended and expanded to include additional medical domains.
- [210] arXiv:2405.10894 [pdf, ps, other]
-
Title: Labelled Well Quasi Ordered Classes of Bounded Linear Clique WidthComments: 24 pagesSubjects: Logic in Computer Science (cs.LO)
We provide an algorithm to decide whether a class of finite graphs that has bounded linear clique width is well-quasi-ordered by the induced subgraph relation in the presence of a labelling of the vertices, where the class is given by an $\mathsf{MSO}$-transduction from finite words. This study leverages tools from automata theory, and the proof scheme allows to derive a weak version of the Pouzet conjecture for classes of bounded linear clique-width. We also provide an automata based characterization of which classes of $\mathsf{NLC}$ graphs are labelled-well-quasi-ordered by the induced subgraph relation, where we recover the results of Daligault Rao and Thomassé by encoding the models into trees with the gap embedding relation of Dershowitz and Tzameret.
- [211] arXiv:2405.10902 [pdf, ps, html, other]
-
Title: Where do developers admit their security-related concerns?Comments: Accepted as an extended abstract for XP'24Subjects: Software Engineering (cs.SE)
Developers use different means to document the security concerns of their code. Because of all of these opportunities, they may forget where the information is stored, or others may not be aware of it, and leave it unmaintained for so long that it becomes obsolete, if not useless. In this work, we analyzed different sources of code documentation from four large-scale, real-world, open-source projects in an industrial setting to understand where developers report their security concerns. In particular, we manually inspected 2.559 instances taken from source code comments, commit messages, and issue trackers. Overall, we found that developers prefer to document security concerns in source code comments and issue trackers. We also found that the longer the comments stay unfixed, the more likely they remain unfixed. Thus, to create awareness among developers, we implemented a pipeline to remind them about the introduction or removal of comments pointing to a security problem.
- [212] arXiv:2405.10904 [pdf, ps, html, other]
-
Title: Broadening Privacy and Surveillance: Eliciting Interconnected Values with a Scenarios Workbook on Smart Home CamerasComments: Proceedings of the 2023 ACM Designing Interactive Systems Conference (DIS '23)Subjects: Human-Computer Interaction (cs.HC); Computers and Society (cs.CY)
We use a design workbook of speculative scenarios as a values elicitation activity with 14 participants. The workbook depicts use case scenarios with smart home camera technologies that involve surveillance and uneven power relations. The scenarios were initially designed by the researchers to explore scenarios of privacy and surveillance within three social relationships involving "primary" and "non-primary" users: Parents-Children, Landlords-Tenants, and Residents-Domestic Workers. When the scenarios were utilized as part of a values elicitation activity with participants, we found that they reflected on a broader set of interconnected social values beyond privacy and surveillance, including autonomy and agency, physical safety, property rights, trust and accountability, and fairness. The paper suggests that future research about ethical issues in smart homes should conceptualize privacy as interconnected with a broader set of social values (which can align or be in tension with privacy), and reflects on considerations for doing research with non-primary users.
- [213] arXiv:2405.10906 [pdf, ps, html, other]
-
Title: POSTER: Testing network-based RTK for GNSS receiver securityComments: To appear in the 17th ACM Conference on Security and Privacy in Wireless and Mobile NetworksSubjects: Cryptography and Security (cs.CR)
Global Navigation Satellite Systems (GNSS) provide precise location, while Real Time Kinematics (RTK) allow mobile receivers (termed rovers), leveraging fixed stations, to correct errors in their Position Navigation and Timing (PNT) solution. This allows compensating for multi-path effects, ionospheric errors, and observation biases, enabling consumer receivers to achieve centimeter-level accuracy. While network distribution of correction streams can be protected with common secure networking practices, the reference stations can still be attacked by GNSS spoofing or jamming. This work investigates (i) the effect RTK reference station spoofing has on the rover's PNT solution quality and (ii) the potential countermeasures towards hardening the RTK infrastructure.
- [214] arXiv:2405.10912 [pdf, ps, html, other]
-
Title: Synthesis of Temporal CausalityComments: 36th International Conference on Computer Aided Verification (CAV 2024)Subjects: Logic in Computer Science (cs.LO)
We present an automata-based algorithm to synthesize omega-regular causes for omega-regular effects on executions of a reactive system, such as counterexamples uncovered by a model checker. Our theory is a generalization of temporal causality, which has recently been proposed as a framework for drawing causal relationships between trace properties on a given trace. So far, algorithms exist only for verifying a single causal relationship and, as an extension, cause synthesis through enumeration, which is complete only for a small fragment of effect properties. This work presents the first complete cause-synthesis algorithm for the class of omega-regular effects. We show that in this case, causes are guaranteed to be omega-regular themselves and can be computed as, e.g., nondeterministic Büchi automata. We demonstrate the practical feasibility of this algorithm with a prototype tool and evaluate its performance for cause synthesis and cause checking.
- [215] arXiv:2405.10913 [pdf, ps, html, other]
-
Title: Blackbox Adaptation for Medical Image SegmentationComments: Accepted early at MICCAI 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
In recent years, various large foundation models have been proposed for image segmentation. There models are often trained on large amounts of data corresponding to general computer vision tasks. Hence, these models do not perform well on medical data. There have been some attempts in the literature to perform parameter-efficient finetuning of such foundation models for medical image segmentation. However, these approaches assume that all the parameters of the model are available for adaptation. But, in many cases, these models are released as APIs or blackboxes, with no or limited access to the model parameters and data. In addition, finetuning methods also require a significant amount of compute, which may not be available for the downstream task. At the same time, medical data can't be shared with third-party agents for finetuning due to privacy reasons. To tackle these challenges, we pioneer a blackbox adaptation technique for prompted medical image segmentation, called BAPS. BAPS has two components - (i) An Image-Prompt decoder (IP decoder) module that generates visual prompts given an image and a prompt, and (ii) A Zero Order Optimization (ZOO) Method, called SPSA-GC that is used to update the IP decoder without the need for backpropagating through the foundation model. Thus, our method does not require any knowledge about the foundation model's weights or gradients. We test BAPS on four different modalities and show that our method can improve the original model's performance by around 4%.
- [216] arXiv:2405.10918 [pdf, ps, html, other]
-
Title: GenToC: Leveraging Partially-Labeled Data for Product Attribute-Value IdentificationSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG)
In the e-commerce domain, the accurate extraction of attribute-value pairs from product listings (e.g., Brand: Apple) is crucial for enhancing search and recommendation systems. The automation of this extraction process is challenging due to the vast diversity of product categories and their respective attributes, compounded by the lack of extensive, accurately annotated training datasets and the demand for low latency to meet the real-time needs of e-commerce platforms. To address these challenges, we introduce GenToC, a novel two-stage model for extracting attribute-value pairs from product titles. GenToC is designed to train with partially-labeled data, leveraging incomplete attribute-value pairs and obviating the need for a fully annotated dataset. Moreover, we introduce a bootstrapping method that enables GenToC to progressively refine and expand its training dataset. This enhancement substantially improves the quality of data available for training other neural network models that are typically faster but are inherently less capable than GenToC in terms of their capacity to handle partially-labeled data. By supplying an enriched dataset for training, GenToC significantly advances the performance of these alternative models, making them more suitable for real-time deployment. Our results highlight the unique capability of GenToC to learn from a limited set of labeled data and to contribute to the training of more efficient models, marking a significant leap forward in the automated extraction of attribute-value pairs from product titles. GenToC has been successfully integrated into India's largest B2B e-commerce platform, this http URL, achieving a significant increase of 21.1% in recall over the existing deployed system while maintaining a high precision of 89.5% in this challenging task.
- [217] arXiv:2405.10923 [pdf, ps, html, other]
-
Title: Randomized Householder QRSubjects: Numerical Analysis (math.NA)
This paper introduces a randomized Householder QR factorization (RHQR). This factorization can be used to obtain a well conditioned basis of a set of vectors and thus can be employed in a variety of applications. The RHQR factorization of the input matrix $W$ is equivalent to the standard Householder QR factorization of matrix $\Psi W$, where $\Psi$ is a sketching matrix that can be obtained from any subspace embedding technique. For this reason, the RHQR algorithm can also be reconstructed from the Householder QR factorization of the sketched problem, yielding a single-synchronization randomized QR factorization (reconstructRHQR). In most contexts, left-looking RHQR requires a single synchronization per iteration, with half the computational cost of Householder QR, and a similar cost to Randomized Gram-Schmidt (RGS) overall. We discuss the usage of RHQR factorization in the Arnoldi process and then in GMRES, showing thus how it can be used in Krylov subspace methods to solve systems of linear equations. Based on Charles Sheffield's connection between Householder QR and Modified Gram-Schmidt (MGS), a BLAS2-RGS is also derived.
Numerical experiments show that RHQR produces a well conditioned basis whose sketch is numerically orthogonal even for the most difficult inputs, and an accurate factorization. The same results were observed with the high-dimensional operations made in half-precision. The reconstructed RHQR from the HQR factorization of the sketch was stabler than the standard Randomized Cholesky QR.
The first version of this work was made available on HAL on the 7th of July 2023 and can be found at https://hal.science/hal-04156310/ - [218] arXiv:2405.10924 [pdf, ps, html, other]
-
Title: Boosting Few-Pixel Robustness Verification via Covering Verification DesignsSubjects: Machine Learning (cs.LG); Logic in Computer Science (cs.LO); Programming Languages (cs.PL)
Proving local robustness is crucial to increase the reliability of neural networks. While many verifiers prove robustness in $L_\infty$ $\epsilon$-balls, very little work deals with robustness verification in $L_0$ $\epsilon$-balls, capturing robustness to few pixel attacks. This verification introduces a combinatorial challenge, because the space of pixels to perturb is discrete and of exponential size. A previous work relies on covering designs to identify sets for defining $L_\infty$ neighborhoods, which if proven robust imply that the $L_0$ $\epsilon$-ball is robust. However, the number of neighborhoods to verify remains very high, leading to a high analysis time. We propose covering verification designs, a combinatorial design that tailors effective but analysis-incompatible coverings to $L_0$ robustness verification. The challenge is that computing a covering verification design introduces a high time and memory overhead, which is intensified in our setting, where multiple candidate coverings are required to identify how to reduce the overall analysis time. We introduce CoVerD, an $L_0$ robustness verifier that selects between different candidate coverings without constructing them, but by predicting their block size distribution. This prediction relies on a theorem providing closed-form expressions for the mean and variance of this distribution. CoVerD constructs the chosen covering verification design on-the-fly, while keeping the memory consumption minimal and enabling to parallelize the analysis. The experimental results show that CoVerD reduces the verification time on average by up to 5.1x compared to prior work and that it scales to larger $L_0$ $\epsilon$-balls.
- [219] arXiv:2405.10927 [pdf, ps, html, other]
-
Title: Using Degeneracy in the Loss Landscape for Mechanistic InterpretabilityLucius Bushnaq, Jake Mendel, Stefan Heimersheim, Dan Braun, Nicholas Goldowsky-Dill, Kaarel Hänni, Cindy Wu, Marius HobbhahnSubjects: Machine Learning (cs.LG)
Mechanistic Interpretability aims to reverse engineer the algorithms implemented by neural networks by studying their weights and activations. An obstacle to reverse engineering neural networks is that many of the parameters inside a network are not involved in the computation being implemented by the network. These degenerate parameters may obfuscate internal structure. Singular learning theory teaches us that neural network parameterizations are biased towards being more degenerate, and parameterizations with more degeneracy are likely to generalize further. We identify 3 ways that network parameters can be degenerate: linear dependence between activations in a layer; linear dependence between gradients passed back to a layer; ReLUs which fire on the same subset of datapoints. We also present a heuristic argument that modular networks are likely to be more degenerate, and we develop a metric for identifying modules in a network that is based on this argument. We propose that if we can represent a neural network in a way that is invariant to reparameterizations that exploit the degeneracies, then this representation is likely to be more interpretable, and we provide some evidence that such a representation is likely to have sparser interactions. We introduce the Interaction Basis, a tractable technique to obtain a representation that is invariant to degeneracies from linear dependence of activations or Jacobians.
- [220] arXiv:2405.10928 [pdf, ps, html, other]
-
Title: The Local Interaction Basis: Identifying Computationally-Relevant and Sparsely Interacting Features in Neural NetworksLucius Bushnaq, Stefan Heimersheim Nicholas Goldowsky-Dill, Dan Braun, Jake Mendel, Kaarel Hänni, Avery Griffin, Jörn Stöhler, Magdalena Wache, Marius HobbhahnSubjects: Machine Learning (cs.LG)
Mechanistic interpretability aims to understand the behavior of neural networks by reverse-engineering their internal computations. However, current methods struggle to find clear interpretations of neural network activations because a decomposition of activations into computational features is missing. Individual neurons or model components do not cleanly correspond to distinct features or functions. We present a novel interpretability method that aims to overcome this limitation by transforming the activations of the network into a new basis - the Local Interaction Basis (LIB). LIB aims to identify computational features by removing irrelevant activations and interactions. Our method drops irrelevant activation directions and aligns the basis with the singular vectors of the Jacobian matrix between adjacent layers. It also scales features based on their importance for downstream computation, producing an interaction graph that shows all computationally-relevant features and interactions in a model. We evaluate the effectiveness of LIB on modular addition and CIFAR-10 models, finding that it identifies more computationally-relevant features that interact more sparsely, compared to principal component analysis. However, LIB does not yield substantial improvements in interpretability or interaction sparsity when applied to language models. We conclude that LIB is a promising theory-driven approach for analyzing neural networks, but in its current form is not applicable to large language models.
- [221] arXiv:2405.10931 [pdf, ps, html, other]
-
Title: FitNets: An Adaptive Framework to Learn Accurate Traffic DistributionsSubjects: Networking and Internet Architecture (cs.NI)
Learning precise distributions of traffic features (e.g., burst sizes, packet inter-arrival time) is still a largely unsolved problem despite being critical for management tasks such as capacity planning or anomaly detection. A key limitation nowadays is the lack of feedback between the control plane and the data plane. Programmable data planes offer the opportunity to create systems that let data- and control plane to work together, compensating their respective shortcomings.
We present FitNets, an adaptive network monitoring system leveraging feedback between the data- and the control plane to learn accurate traffic distributions. In the control plane, FitNets relies on Kernel Density Estimators which allow to provably learn distributions of any shape. In the data plane, FitNets tests the accuracy of the learned distributions while dynamically adapting data collection to the observed distribution fitness, prioritizing under-fitted features.
We have implemented FitNets in Python and P4 (including on commercially available programmable switches) and tested it on real and synthetic traffic traces. FitNets is practical: it is able to estimate hundreds of distributions from up to 60 millions samples per second, while providing accurate error estimates and adapting to complex traffic patterns. - [222] arXiv:2405.10934 [pdf, ps, html, other]
-
Title: Reconstruction of Manipulated Garment with Guided Deformation PriorSubjects: Computer Vision and Pattern Recognition (cs.CV)
Modeling the shape of garments has received much attention, but most existing approaches assume the garments to be worn by someone, which constrains the range of shapes they can assume. In this work, we address shape recovery when garments are being manipulated instead of worn, which gives rise to an even larger range of possible shapes. To this end, we leverage the implicit sewing patterns (ISP) model for garment modeling and extend it by adding a diffusion-based deformation prior to represent these shapes. To recover 3D garment shapes from incomplete 3D point clouds acquired when the garment is folded, we map the points to UV space, in which our priors are learned, to produce partial UV maps, and then fit the priors to recover complete UV maps and 2D to 3D mappings. Experimental results demonstrate the superior reconstruction accuracy of our method compared to previous ones, especially when dealing with large non-rigid deformations arising from the manipulations.
- [223] arXiv:2405.10936 [pdf, ps, html, other]
-
Title: A Survey on Large Language Models with Multilingualism: Recent Advances and New FrontiersKaiyu Huang, Fengran Mo, Hongliang Li, You Li, Yuanchi Zhang, Weijian Yi, Yulong Mao, Jinchen Liu, Yuzhuang Xu, Jinan Xu, Jian-Yun Nie, Yang LiuComments: 54 pages, Work in ProgressSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
The rapid development of Large Language Models (LLMs) demonstrates remarkable multilingual capabilities in natural language processing, attracting global attention in both academia and industry. To mitigate potential discrimination and enhance the overall usability and accessibility for diverse language user groups, it is important for the development of language-fair technology. Despite the breakthroughs of LLMs, the investigation into the multilingual scenario remains insufficient, where a comprehensive survey to summarize recent approaches, developments, limitations, and potential solutions is desirable. To this end, we provide a survey with multiple perspectives on the utilization of LLMs in the multilingual scenario. We first rethink the transitions between previous and current research on pre-trained language models. Then we introduce several perspectives on the multilingualism of LLMs, including training and inference methods, model security, multi-domain with language culture, and usage of datasets. We also discuss the major challenges that arise in these aspects, along with possible solutions. Besides, we highlight future research directions that aim at further enhancing LLMs with multilingualism. The survey aims to help the research community address multilingual problems and provide a comprehensive understanding of the core concepts, key techniques, and latest developments in multilingual natural language processing based on LLMs.
- [224] arXiv:2405.10938 [pdf, ps, html, other]
-
Title: Observational Scaling Laws and the Predictability of Language Model PerformanceSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (stat.ML)
Understanding how language model performance varies with scale is critical to benchmark and algorithm development. Scaling laws are one approach to building this understanding, but the requirement of training models across many different scales has limited their use. We propose an alternative, observational approach that bypasses model training and instead builds scaling laws from ~80 publically available models. Building a single scaling law from multiple model families is challenging due to large variations in their training compute efficiencies and capabilities. However, we show that these variations are consistent with a simple, generalized scaling law where language model performance is a function of a low-dimensional capability space, and model families only vary in their efficiency in converting training compute to capabilities. Using this approach, we show the surprising predictability of complex scaling phenomena: we show that several emergent phenomena follow a smooth, sigmoidal behavior and are predictable from small models; we show that the agent performance of models such as GPT-4 can be precisely predicted from simpler non-agentic benchmarks; and we show how to predict the impact of post-training interventions like Chain-of-Thought and Self-Consistency as language model capabilities continue to improve.
- [225] arXiv:2405.10939 [pdf, ps, html, other]
-
Title: DINO as a von Mises-Fisher mixture modelComments: Accepted to ICLR 2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV)
Self-distillation methods using Siamese networks are popular for self-supervised pre-training. DINO is one such method based on a cross-entropy loss between $K$-dimensional probability vectors, obtained by applying a softmax function to the dot product between representations and learnt prototypes. Given the fact that the learned representations are $L^2$-normalized, we show that DINO and its derivatives, such as iBOT, can be interpreted as a mixture model of von Mises-Fisher components. With this interpretation, DINO assumes equal precision for all components when the prototypes are also $L^2$-normalized. Using this insight we propose DINO-vMF, that adds appropriate normalization constants when computing the cluster assignment probabilities. Unlike DINO, DINO-vMF is stable also for the larger ViT-Base model with unnormalized prototypes. We show that the added flexibility of the mixture model is beneficial in terms of better image representations. The DINO-vMF pre-trained model consistently performs better than DINO on a range of downstream tasks. We obtain similar improvements for iBOT-vMF vs iBOT and thereby show the relevance of our proposed modification also for other methods derived from DINO.
New submissions for Monday, 20 May 2024 (showing 225 of 225 entries )
- [226] arXiv:2405.09843 (cross-list from econ.TH) [pdf, ps, other]
-
Title: Organizational Selection of InnovationComments: 40 pages, 13 figures, 2 tablesSubjects: Theoretical Economics (econ.TH); Multiagent Systems (cs.MA); Physics and Society (physics.soc-ph); Applications (stat.AP)
Budgetary constraints force organizations to pursue only a subset of possible innovation projects. Identifying which subset is most promising is an error-prone exercise, and involving multiple decision makers may be prudent. This raises the question of how to most effectively aggregate their collective nous. Our model of organizational portfolio selection provides some first answers. We show that portfolio performance can vary widely. Delegating evaluation makes sense when organizations employ the relevant experts and can assign projects to them. In most other settings, aggregating the impressions of multiple agents leads to better performance than delegation. In particular, letting agents rank projects often outperforms alternative aggregation rules -- including averaging agents' project scores as well as counting their approval votes -- especially when organizations have tight budgets and can select only a few project alternatives out of many.
- [227] arXiv:2405.09993 (cross-list from hep-th) [pdf, ps, html, other]
-
Title: Learning BPS Spectra and the Gap ConjectureComments: 11 pages, 4 figures, 3 tablesSubjects: High Energy Physics - Theory (hep-th); Machine Learning (cs.LG); Mathematical Physics (math-ph); Geometric Topology (math.GT)
We explore statistical properties of BPS q-series for 3d N=2 strongly coupled supersymmetric theories that correspond to a particular family of 3-manifolds Y. We discover that gaps between exponents in the q-series are statistically more significant at the beginning of the q-series compared to gaps that appear in higher powers of q. Our observations are obtained by calculating saliencies of q-series features used as input data for principal component analysis, which is a standard example of an explainable machine learning technique that allows for a direct calculation and a better analysis of feature saliencies.
- [228] arXiv:2405.10322 (cross-list from physics.soc-ph) [pdf, ps, other]
-
Title: Exploring the Independent Cascade Model and Its Evolution in Social Network Information DiffusionSubjects: Physics and Society (physics.soc-ph); Artificial Intelligence (cs.AI)
This paper delves into the paramount significance of information dissemination within the dynamic realm of social networks. It underscores the pivotal role of information communication models in unraveling the intricacies of data propagation in the digital age. By shedding light on the profound influence of these models, it not only lays the groundwork for exploring various hierarchies and their manifestations but also serves as a catalyst for further research in this formidable field.
- [229] arXiv:2405.10327 (cross-list from physics.geo-ph) [pdf, ps, html, other]
-
Title: WISER: multimodal variational inference for full-waveform inversion without dimensionality reductionSubjects: Geophysics (physics.geo-ph); Computational Engineering, Finance, and Science (cs.CE)
We present a semi-amortized variational inference framework designed for computationally feasible uncertainty quantification in 2D full-waveform inversion to explore the multimodal posterior distribution without dimensionality reduction. The framework is called WISER, short for full-Waveform variational Inference via Subsurface Extensions with Refinements. WISER leverages the power of generative artificial intelligence to perform approximate amortized inference that is low-cost albeit showing an amortization gap. This gap is closed through non-amortized refinements that make frugal use of acoustic wave physics. Case studies illustrate that WISER is capable of full-resolution, computationally feasible, and reliable uncertainty estimates of velocity models and imaged reflectivities.
- [230] arXiv:2405.10329 (cross-list from stat.AP) [pdf, ps, other]
-
Title: Causal inference approach to appraise long-term effects of maintenance policy on functional performance of asphalt pavementsSubjects: Applications (stat.AP); Artificial Intelligence (cs.AI)
Asphalt pavements as the most prevalent transportation infrastructure, are prone to serious traffic safety problems due to functional or structural damage caused by stresses or strains imposed through repeated traffic loads and continuous climatic cycles. The good quality or high serviceability of infrastructure networks is vital to the urbanization and industrial development of nations. In order to maintain good functional pavement performance and extend the service life of asphalt pavements, the long-term performance of pavements under maintenance policies needs to be evaluated and favorable options selected based on the condition of the pavement. A major challenge in evaluating maintenance policies is to produce valid treatments for the outcome assessment under the control of uncertainty of vehicle loads and the disturbance of freeze-thaw cycles in the climatic environment. In this study, a novel causal inference approach combining a classical causal structural model and a potential outcome model framework is proposed to appraise the long-term effects of four preventive maintenance treatments for longitudinal cracking over a 5-year period of upkeep. Three fundamental issues were brought to our attention: 1) detection of causal relationships prior to variables under environmental loading (identification of causal structure); 2) obtaining direct causal effects of treatment on outcomes excluding covariates (identification of causal effects); and 3) sensitivity analysis of causal relationships. The results show that the method can accurately evaluate the effect of preventive maintenance treatments and assess the maintenance time to cater well for the functional performance of different preventive maintenance approaches. This framework could help policymakers to develop appropriate maintenance strategies for pavements.
- [231] arXiv:2405.10343 (cross-list from q-bio.BM) [pdf, ps, other]
-
Title: UniCorn: A Unified Contrastive Learning Approach for Multi-view Molecular Representation LearningSubjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)
Recently, a noticeable trend has emerged in developing pre-trained foundation models in the domains of CV and NLP. However, for molecular pre-training, there lacks a universal model capable of effectively applying to various categories of molecular tasks, since existing prevalent pre-training methods exhibit effectiveness for specific types of downstream tasks. Furthermore, the lack of profound understanding of existing pre-training methods, including 2D graph masking, 2D-3D contrastive learning, and 3D denoising, hampers the advancement of molecular foundation models. In this work, we provide a unified comprehension of existing pre-training methods through the lens of contrastive learning. Thus their distinctions lie in clustering different views of molecules, which is shown beneficial to specific downstream tasks. To achieve a complete and general-purpose molecular representation, we propose a novel pre-training framework, named UniCorn, that inherits the merits of the three methods, depicting molecular views in three different levels. SOTA performance across quantum, physicochemical, and biological tasks, along with comprehensive ablation study, validate the universality and effectiveness of UniCorn.
- [232] arXiv:2405.10345 (cross-list from q-bio.QM) [pdf, ps, html, other]
-
Title: Machine Learning Driven Biomarker Selection for Medical DiagnosisDivyagna Bavikadi, Ayushi Agarwal, Shashank Ganta, Yunro Chung, Lusheng Song, Ji Qiu, Paulo ShakarianSubjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Recent advances in experimental methods have enabled researchers to collect data on thousands of analytes simultaneously. This has led to correlational studies that associated molecular measurements with diseases such as Alzheimer's, Liver, and Gastric Cancer. However, the use of thousands of biomarkers selected from the analytes is not practical for real-world medical diagnosis and is likely undesirable due to potentially formed spurious correlations. In this study, we evaluate 4 different methods for biomarker selection and 4 different machine learning (ML) classifiers for identifying correlations, evaluating 16 approaches in all. We found that contemporary methods outperform previously reported logistic regression in cases where 3 and 10 biomarkers are permitted. When specificity is fixed at 0.9, ML approaches produced a sensitivity of 0.240 (3 biomarkers) and 0.520 (10 biomarkers), while standard logistic regression provided a sensitivity of 0.000 (3 biomarkers) and 0.040 (10 biomarkers). We also noted that causal-based methods for biomarker selection proved to be the most performant when fewer biomarkers were permitted, while univariate feature selection was the most performant when a greater number of biomarkers were permitted.
- [233] arXiv:2405.10348 (cross-list from q-bio.QM) [pdf, ps, html, other]
-
Title: Learning to Predict Mutation Effects of Protein-Protein Interactions by Microenvironment-aware Hierarchical Prompt LearningSubjects: Quantitative Methods (q-bio.QM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Protein-protein bindings play a key role in a variety of fundamental biological processes, and thus predicting the effects of amino acid mutations on protein-protein binding is crucial. To tackle the scarcity of annotated mutation data, pre-training with massive unlabeled data has emerged as a promising solution. However, this process faces a series of challenges: (1) complex higher-order dependencies among multiple (more than paired) structural scales have not yet been fully captured; (2) it is rarely explored how mutations alter the local conformation of the surrounding microenvironment; (3) pre-training is costly, both in data size and computational burden. In this paper, we first construct a hierarchical prompt codebook to record common microenvironmental patterns at different structural scales independently. Then, we develop a novel codebook pre-training task, namely masked microenvironment modeling, to model the joint distribution of each mutation with their residue types, angular statistics, and local conformational changes in the microenvironment. With the constructed prompt codebook, we encode the microenvironment around each mutation into multiple hierarchical prompts and combine them to flexibly provide information to wild-type and mutated protein complexes about their microenvironmental differences. Such a hierarchical prompt learning framework has demonstrated superior performance and training efficiency over state-of-the-art pre-training-based methods in mutation effect prediction and a case study of optimizing human antibodies against SARS-CoV-2.
- [234] arXiv:2405.10355 (cross-list from physics.soc-ph) [pdf, ps, html, other]
-
Title: Assessing the Impact of Case Correction Methods on the Fairness of COVID-19 Predictive ModelsSubjects: Physics and Society (physics.soc-ph); Computers and Society (cs.CY)
One of the central difficulties of addressing the COVID-19 pandemic has been accurately measuring and predicting the spread of infections. In particular, official COVID-19 case counts in the United States are under counts of actual caseloads due to the absence of universal testing policies. Researchers have proposed a variety of methods for recovering true caseloads, often through the estimation of statistical models on more reliable measures, such as death and hospitalization counts, positivity rates, and demographics. However, given the disproportionate impact of COVID-19 on marginalized racial, ethnic, and socioeconomic groups, it is important to consider potential unintended effects of case correction methods on these groups. Thus, we investigate two of these correction methods for their impact on a downstream COVID-19 case prediction task. For that purpose, we tailor an auditing approach and evaluation protocol to analyze the fairness of the COVID-19 prediction task by measuring the difference in model performance between majority-White counties and majority-minority counties. We find that one of the correction methods improves fairness, decreasing differences in performance between majority-White and majority-minority counties, while the other method increases differences, introducing bias. While these results are mixed, it is evident that correction methods have the potential to exacerbate existing biases in COVID-19 case data and in downstream prediction tasks. Researchers planning to develop or use case correction methods must be careful to consider negative effects on marginalized groups.
- [235] arXiv:2405.10360 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Adversarial Robustness Guarantees for Quantum ClassifiersNeil Dowling, Maxwell T. West, Angus Southwell, Azar C. Nakhl, Martin Sevior, Muhammad Usman, Kavan ModiComments: 9+12 pages, 3 figures. Comments welcomeSubjects: Quantum Physics (quant-ph); Statistical Mechanics (cond-mat.stat-mech); Machine Learning (cs.LG); Chaotic Dynamics (nlin.CD)
Despite their ever more widespread deployment throughout society, machine learning algorithms remain critically vulnerable to being spoofed by subtle adversarial tampering with their input data. The prospect of near-term quantum computers being capable of running {quantum machine learning} (QML) algorithms has therefore generated intense interest in their adversarial vulnerability. Here we show that quantum properties of QML algorithms can confer fundamental protections against such attacks, in certain scenarios guaranteeing robustness against classically-armed adversaries. We leverage tools from many-body physics to identify the quantum sources of this protection. Our results offer a theoretical underpinning of recent evidence which suggest quantum advantages in the search for adversarial robustness. In particular, we prove that quantum classifiers are: (i) protected against weak perturbations of data drawn from the trained distribution, (ii) protected against local attacks if they are insufficiently scrambling, and (iii) protected against universal adversarial attacks if they are sufficiently quantum chaotic. Our analytic results are supported by numerical evidence demonstrating the applicability of our theorems and the resulting robustness of a quantum classifier in practice. This line of inquiry constitutes a concrete pathway to advantage in QML, orthogonal to the usually sought improvements in model speed or accuracy.
- [236] arXiv:2405.10369 (cross-list from astro-ph.IM) [pdf, ps, html, other]
-
Title: Reinforcement learningComments: To appear, Astronomy & ComputingSubjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Observing celestial objects and advancing our scientific knowledge about them involves tedious planning, scheduling, data collection and data post-processing. Many of these operational aspects of astronomy are guided and executed by expert astronomers. Reinforcement learning is a mechanism where we (as humans and astronomers) can teach agents of artificial intelligence to perform some of these tedious tasks. In this paper, we will present a state of the art overview of reinforcement learning and how it can benefit astronomy.
- [237] arXiv:2405.10399 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: A note on continuous-time online learningSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Numerical Analysis (math.NA); Optimization and Control (math.OC)
In online learning, the data is provided in a sequential order, and the goal of the learner is to make online decisions to minimize overall regrets. This note is concerned with continuous-time models and algorithms for several online learning problems: online linear optimization, adversarial bandit, and adversarial linear bandit. For each problem, we extend the discrete-time algorithm to the continuous-time setting and provide a concise proof of the optimal regret bound.
- [238] arXiv:2405.10414 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: A Reliability Theory of Compromise Decisions for Large-Scale Stochastic ProgramsSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
Stochastic programming models can lead to very large-scale optimization problems for which it may be impossible to enumerate all possible scenarios. In such cases, one adopts a sampling-based solution methodology in which case the reliability of the resulting decisions may be suspect. For such instances, it is advisable to adopt methodologies that promote variance reduction. One such approach goes under a framework known as "compromise decision", which requires multiple replications of the solution procedure. This paper studies the reliability of stochastic programming solutions resulting from the "compromise decision" process. This process is characterized by minimizing an aggregation of objective function approximations across replications, presumably conducted in parallel. We refer to the post-parallel-processing problem as the problem of "compromise decision". We quantify the reliability of compromise decisions by estimating the expectation and variance of the "pessimistic distance" of sampled instances from the set of true optimal decisions. Such pessimistic distance is defined as an estimate of the largest possible distance of the solution of the sampled instance from the "true" optimal solution set. The Rademacher average of instances is used to bound the sample complexity of the compromise decision.
- [239] arXiv:2405.10438 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: Optimization-Aided Construction of Multivariate Chebyshev PolynomialsSubjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)
This article is concerned with an extension of univariate Chebyshev polynomials of the first kind to the multivariate setting, where one chases best approximants to specific monomials by polynomials of lower degree relative to the uniform norm. Exploiting the Moment-SOS hierarchy, we devise a versatile semidefinite-programming-based procedure to compute such best approximants, as well as associated signatures. Applying this procedure in three variables leads to the values of best approximation errors for all mononials up to degree six on the euclidean ball, the simplex, and the cross-polytope. Furthermore, inspired by numerical experiments, we obtain explicit expressions for Chebyshev polynomials in two cases unresolved before, namely for the monomial $x_1^2 x_2^2 x_3$ on the euclidean ball and for the monomial $x_1^2 x_2 x_3$ on the simplex.
- [240] arXiv:2405.10442 (cross-list from physics.flu-dyn) [pdf, ps, html, other]
-
Title: Data-driven low-dimensional model of a sedimenting flexible fiberSubjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG)
The dynamics of flexible filaments entrained in flow, important for understanding many biological and industrial processes, are computationally expensive to model with full-physics simulations. This work describes a data-driven technique to create high-fidelity low-dimensional models of flexible fiber dynamics using machine learning; the technique is applied to sedimentation in a quiescent, viscous Newtonian fluid, using results from detailed simulations as the data set. The approach combines an autoencoder neural network architecture to learn a low-dimensional latent representation of the filament shape, with a neural ODE that learns the evolution of the particle in the latent state. The model was designed to model filaments of varying flexibility, characterized by an elasto-gravitational number $\mathcal{B}$, and was trained on a data set containing the evolution of fibers beginning at set angles of inclination. For the range of $\mathcal{B}$ considered here (100-10000), the filament shape dynamics can be represented with high accuracy with only four degrees of freedom, in contrast to the 93 present in the original bead-spring model used to generate the dynamic trajectories. We predict the evolution of fibers set at arbitrary angles and demonstrate that our data-driven model can accurately forecast the evolution of a fiber at both trained and untrained elasto-gravitational numbers.
- [241] arXiv:2405.10448 (cross-list from cond-mat.mtrl-sci) [pdf, ps, html, other]
-
Title: Dynamic In-context Learning with Conversational Models for Data Extraction and Materials Property PredictionSubjects: Materials Science (cond-mat.mtrl-sci); Artificial Intelligence (cs.AI); Computational Physics (physics.comp-ph)
The advent of natural language processing and large language models (LLMs) has revolutionized the extraction of data from unstructured scholarly papers. However, ensuring data trustworthiness remains a significant challenge. In this paper, we introduce PropertyExtractor, an open-source tool that leverages advanced conversational LLMs like Google Gemini-Pro and OpenAI GPT-4, blends zero-shot with few-shot in-context learning, and employs engineered prompts for the dynamic refinement of structured information hierarchies, enabling autonomous, efficient, scalable, and accurate identification, extraction, and verification of material property data. Our tests on material data demonstrate precision and recall exceeding 93% with an error rate of approximately 10%, highlighting the effectiveness and versatility of the toolkit. We apply PropertyExtractor to generate a database of 2D material thicknesses, a critical parameter for device integration. The rapid evolution of the field has outpaced both experimental measurements and computational methods, creating a significant data gap. Our work addresses this gap and showcases the potential of PropertyExtractor as a reliable and efficient tool for the autonomous generation of diverse material property databases, advancing the field.
- [242] arXiv:2405.10449 (cross-list from econ.EM) [pdf, ps, html, other]
-
Title: Optimal Text-Based Time-Series IndicesSubjects: Econometrics (econ.EM); Artificial Intelligence (cs.AI); Computational Finance (q-fin.CP)
We propose an approach to construct text-based time-series indices in an optimal way--typically, indices that maximize the contemporaneous relation or the predictive performance with respect to a target variable, such as inflation. We illustrate our methodology with a corpus of news articles from the Wall Street Journal by optimizing text-based indices focusing on tracking the VIX index and inflation expectations. Our results highlight the superior performance of our approach compared to existing indices.
- [243] arXiv:2405.10490 (cross-list from stat.ME) [pdf, ps, other]
-
Title: Neural Optimization with Adaptive Heuristics for Intelligent Marketing SystemChangshuai Wei, Benjamin Zelditch, Joyce Chen, Andre Assuncao Silva T Ribeiro, Jingyi Kenneth Tay, Borja Ocejo Elizondo, Keerthi Selvaraj, Aman Gupta, Licurgo Benemann De AlmeidaComments: KDD 2024Subjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Information Retrieval (cs.IR); Machine Learning (cs.LG); Optimization and Control (math.OC)
Computational marketing has become increasingly important in today's digital world, facing challenges such as massive heterogeneous data, multi-channel customer journeys, and limited marketing budgets. In this paper, we propose a general framework for marketing AI systems, the Neural Optimization with Adaptive Heuristics (NOAH) framework. NOAH is the first general framework for marketing optimization that considers both to-business (2B) and to-consumer (2C) products, as well as both owned and paid channels. We describe key modules of the NOAH framework, including prediction, optimization, and adaptive heuristics, providing examples for bidding and content optimization. We then detail the successful application of NOAH to LinkedIn's email marketing system, showcasing significant wins over the legacy ranking system. Additionally, we share details and insights that are broadly useful, particularly on: (i) addressing delayed feedback with lifetime value, (ii) performing large-scale linear programming with randomization, (iii) improving retrieval with audience expansion, (iv) reducing signal dilution in targeting tests, and (v) handling zero-inflated heavy-tail metrics in statistical testing.
- [244] arXiv:2405.10550 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: LighTDiff: Surgical Endoscopic Image Low-Light Enhancement with T-DiffusionTong Chen, Qingcheng Lyu, Long Bai, Erjian Guo, Huxin Gao, Xiaoxiao Yang, Hongliang Ren, Luping ZhouSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Advances in endoscopy use in surgeries face challenges like inadequate lighting. Deep learning, notably the Denoising Diffusion Probabilistic Model (DDPM), holds promise for low-light image enhancement in the medical field. However, DDPMs are computationally demanding and slow, limiting their practical medical applications. To bridge this gap, we propose a lightweight DDPM, dubbed LighTDiff. It adopts a T-shape model architecture to capture global structural information using low-resolution images and gradually recover the details in subsequent denoising steps. We further prone the model to significantly reduce the model size while retaining performance. While discarding certain downsampling operations to save parameters leads to instability and low efficiency in convergence during the training, we introduce a Temporal Light Unit (TLU), a plug-and-play module, for more stable training and better performance. TLU associates time steps with denoised image features, establishing temporal dependencies of the denoising steps and improving denoising outcomes. Moreover, while recovering images using the diffusion model, potential spectral shifts were noted. We further introduce a Chroma Balancer (CB) to mitigate this issue. Our LighTDiff outperforms many competitive LLIE methods with exceptional computational efficiency.
- [245] arXiv:2405.10552 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Data Science Principles for Interpretable and Explainable AISubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Society's capacity for algorithmic problem-solving has never been greater. Artificial Intelligence is now applied across more domains than ever, a consequence of powerful abstractions, abundant data, and accessible software. As capabilities have expanded, so have risks, with models often deployed without fully understanding their potential impacts. Interpretable and interactive machine learning aims to make complex models more transparent and controllable, enhancing user agency. This review synthesizes key principles from the growing literature in this field.
We first introduce precise vocabulary for discussing interpretability, like the distinction between glass box and explainable algorithms. We then explore connections to classical statistical and design principles, like parsimony and the gulfs of interaction. Basic explainability techniques -- including learned embeddings, integrated gradients, and concept bottlenecks -- are illustrated with a simple case study. We also review criteria for objectively evaluating interpretability approaches. Throughout, we underscore the importance of considering audience goals when designing interactive algorithmic systems. Finally, we outline open challenges and discuss the potential role of data science in addressing them. Code to reproduce all examples can be found at this https URL. - [246] arXiv:2405.10561 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Infrared Image Super-Resolution via Lightweight Information Split NetworkSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Single image super-resolution (SR) is an established pixel-level vision task aimed at reconstructing a high-resolution image from its degraded low-resolution counterpart. Despite the notable advancements achieved by leveraging deep neural networks for SR, most existing deep learning architectures feature an extensive number of layers, leading to high computational complexity and substantial memory demands. These issues become particularly pronounced in the context of infrared image SR, where infrared devices often have stringent storage and computational constraints. To mitigate these challenges, we introduce a novel, efficient, and precise single infrared image SR model, termed the Lightweight Information Split Network (LISN). The LISN comprises four main components: shallow feature extraction, deep feature extraction, dense feature fusion, and high-resolution infrared image reconstruction. A key innovation within this model is the introduction of the Lightweight Information Split Block (LISB) for deep feature extraction. The LISB employs a sequential process to extract hierarchical features, which are then aggregated based on the relevance of the features under consideration. By integrating channel splitting and shift operations, the LISB successfully strikes an optimal balance between enhanced SR performance and a lightweight framework. Comprehensive experimental evaluations reveal that the proposed LISN achieves superior performance over contemporary state-of-the-art methods in terms of both SR quality and model complexity, affirming its efficacy for practical deployment in resource-constrained infrared imaging applications.
- [247] arXiv:2405.10570 (cross-list from eess.IV) [pdf, ps, other]
-
Title: Simultaneous Deep Learning of Myocardium Segmentation and T2 Quantification for Acute Myocardial Infarction MRIYirong Zhou, Chengyan Wang, Mengtian Lu, Kunyuan Guo, Zi Wang, Dan Ruan, Rui Guo, Peijun Zhao, Jianhua Wang, Naiming Wu, Jianzhong Lin, Yinyin Chen, Hang Jin, Lianxin Xie, Lilan Wu, Liuhong Zhu, Jianjun Zhou, Congbo Cai, He Wang, Xiaobo QuComments: 10 pages, 8 figures, 6 tablesSubjects: Image and Video Processing (eess.IV); Artificial Intelligence (cs.AI)
In cardiac Magnetic Resonance Imaging (MRI) analysis, simultaneous myocardial segmentation and T2 quantification are crucial for assessing myocardial pathologies. Existing methods often address these tasks separately, limiting their synergistic potential. To address this, we propose SQNet, a dual-task network integrating Transformer and Convolutional Neural Network (CNN) components. SQNet features a T2-refine fusion decoder for quantitative analysis, leveraging global features from the Transformer, and a segmentation decoder with multiple local region supervision for enhanced accuracy. A tight coupling module aligns and fuses CNN and Transformer branch features, enabling SQNet to focus on myocardium regions. Evaluation on healthy controls (HC) and acute myocardial infarction patients (AMI) demonstrates superior segmentation dice scores (89.3/89.2) compared to state-of-the-art methods (87.7/87.9). T2 quantification yields strong linear correlations (Pearson coefficients: 0.84/0.93) with label values for HC/AMI, indicating accurate mapping. Radiologist evaluations confirm SQNet's superior image quality scores (4.60/4.58 for segmentation, 4.32/4.42 for T2 quantification) over state-of-the-art methods (4.50/4.44 for segmentation, 3.59/4.37 for T2 quantification). SQNet thus offers accurate simultaneous segmentation and quantification, enhancing cardiac disease diagnosis, such as AMI.
- [248] arXiv:2405.10649 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: Recovery of Sparse Graph SignalsComments: This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessibleSubjects: Signal Processing (eess.SP); Systems and Control (eess.SY); Optimization and Control (math.OC)
This paper investigates the recovery of a node-domain sparse graph signal from the output of a graph filter. This problem, often referred to as the identification of the source of a diffused sparse graph signal, is seminal in the field of graph signal processing (GSP). Sparse graph signals can be used in the modeling of a variety of real-world applications in networks, such as social, biological, and power systems, and enable various GSP tasks, such as graph signal reconstruction, blind deconvolution, and sampling. In this paper, we assume double sparsity of both the graph signal and the graph topology, as well as a low-order graph filter. We propose three algorithms to reconstruct the support set of the input sparse graph signal from the graph filter output samples, leveraging these assumptions and the generalized information criterion (GIC). First, we describe the graph multiple GIC (GM-GIC) method, which is based on partitioning the dictionary elements (graph filter matrix columns) that capture information on the signal into smaller subsets. Then, the local GICs are computed for each subset and aggregated to make a global decision. Second, inspired by the well-known branch and bound (BNB) approach, we develop the graph-based branch and bound GIC (graph-BNB-GIC), and incorporate a new tractable heuristic bound tailored to the graph and graph filter characteristics. Finally, we propose the graph-based first order correction (GFOC) method, which improves existing sparse recovery methods by iteratively examining potential improvements to the GIC cost function through replacing elements from the estimated support set with elements from their one-hop neighborhood. We conduct simulations that demonstrate that the proposed sparse recovery methods outperform existing methods in terms of support set recovery accuracy, and without a significant computational overhead.
- [249] arXiv:2405.10691 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: LoCI-DiffCom: Longitudinal Consistency-Informed Diffusion Model for 3D Infant Brain Image CompletionZihao Zhu, Tianli Tao, Yitian Tao, Haowen Deng, Xinyi Cai, Gaofeng Wu, Kaidong Wang, Haifeng Tang, Lixuan Zhu, Zhuoyang Gu, Jiawei Huang, Dinggang Shen, Han ZhangSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
The infant brain undergoes rapid development in the first few years after birth.Compared to cross-sectional studies, longitudinal studies can depict the trajectories of infants brain development with higher accuracy, statistical power and flexibility.However, the collection of infant longitudinal magnetic resonance (MR) data suffers a notorious dropout problem, resulting in incomplete datasets with missing time points. This limitation significantly impedes subsequent neuroscience and clinical modeling. Yet, existing deep generative models are facing difficulties in missing brain image completion, due to sparse data and the nonlinear, dramatic contrast/geometric variations in the developing brain. We propose LoCI-DiffCom, a novel Longitudinal Consistency-Informed Diffusion model for infant brain image Completion,which integrates the images from preceding and subsequent time points to guide a diffusion model for generating high-fidelity missing data. Our designed LoCI module can work on highly sparse sequences, relying solely on data from two temporal points. Despite wide separation and diversity between age time points, our approach can extract individualized developmental features while ensuring context-aware consistency. Our experiments on a large infant brain MR dataset demonstrate its effectiveness with consistent performance on missing infant brain MR completion even in big gap scenarios, aiding in better delineation of early developmental trajectories.
- [250] arXiv:2405.10705 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: 3D Vessel Reconstruction from Sparse-View Dynamic DSA Images via Vessel Probability Guided Attenuation LearningZhentao Liu, Huangxuan Zhao, Wenhui Qin, Zhenghong Zhou, Xinggang Wang, Wenping Wang, Xiaochun Lai, Chuansheng Zheng, Dinggang Shen, Zhiming CuiComments: 12 pages, 13 figures, 5 tablesSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Digital Subtraction Angiography (DSA) is one of the gold standards in vascular disease diagnosing. With the help of contrast agent, time-resolved 2D DSA images deliver comprehensive insights into blood flow information and can be utilized to reconstruct 3D vessel structures. Current commercial DSA systems typically demand hundreds of scanning views to perform reconstruction, resulting in substantial radiation exposure. However, sparse-view DSA reconstruction, aimed at reducing radiation dosage, is still underexplored in the research community. The dynamic blood flow and insufficient input of sparse-view DSA images present significant challenges to the 3D vessel reconstruction task. In this study, we propose to use a time-agnostic vessel probability field to solve this problem effectively. Our approach, termed as vessel probability guided attenuation learning, represents the DSA imaging as a complementary weighted combination of static and dynamic attenuation fields, with the weights derived from the vessel probability field. Functioning as a dynamic mask, vessel probability provides proper gradients for both static and dynamic fields adaptive to different scene types. This mechanism facilitates a self-supervised decomposition between static backgrounds and dynamic contrast agent flow, and significantly improves the reconstruction quality. Our model is trained by minimizing the disparity between synthesized projections and real captured DSA images. We further employ two training strategies to improve our reconstruction quality: (1) coarse-to-fine progressive training to achieve better geometry and (2) temporal perturbed rendering loss to enforce temporal consistency. Experimental results have demonstrated superior quality on both 3D vessel reconstruction and 2D view synthesis.
- [251] arXiv:2405.10723 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Eddeep: Fast eddy-current distortion correction for diffusion MRI with deep learningComments: submitted to MICCAI 2024Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Modern diffusion MRI sequences commonly acquire a large number of volumes with diffusion sensitization gradients of differing strengths or directions. Such sequences rely on echo-planar imaging (EPI) to achieve reasonable scan duration. However, EPI is vulnerable to off-resonance effects, leading to tissue susceptibility and eddy-current induced distortions. The latter is particularly problematic because it causes misalignment between volumes, disrupting downstream modelling and analysis. The essential correction of eddy distortions is typically done post-acquisition, with image registration. However, this is non-trivial because correspondence between volumes can be severely disrupted due to volume-specific signal attenuations induced by varying directions and strengths of the applied gradients. This challenge has been successfully addressed by the popular FSL~Eddy tool but at considerable computational cost. We propose an alternative approach, leveraging recent advances in image processing enabled by deep learning (DL). It consists of two convolutional neural networks: 1) An image translator to restore correspondence between images; 2) A registration model to align the translated images. Results demonstrate comparable distortion estimates to FSL~Eddy, while requiring only modest training sample sizes. This work, to the best of our knowledge, is the first to tackle this problem with deep learning. Together with recently developed DL-based susceptibility correction techniques, they pave the way for real-time preprocessing of diffusion MRI, facilitating its wider uptake in the clinic.
- [252] arXiv:2405.10754 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: Stable Phase Retrieval with Mirror DescentSubjects: Optimization and Control (math.OC); Computer Vision and Pattern Recognition (cs.CV); Information Theory (cs.IT)
In this paper, we aim to reconstruct an n-dimensional real vector from m phaseless measurements corrupted by an additive noise. We extend the noiseless framework developed in [15], based on mirror descent (or Bregman gradient descent), to deal with noisy measurements and prove that the procedure is stable to (small enough) additive noise. In the deterministic case, we show that mirror descent converges to a critical point of the phase retrieval problem, and if the algorithm is well initialized and the noise is small enough, the critical point is near the true vector up to a global sign change. When the measurements are i.i.d Gaussian and the signal-to-noise ratio is large enough, we provide global convergence guarantees that ensure that with high probability, mirror descent converges to a global minimizer near the true vector (up to a global sign change), as soon as the number of measurements m is large enough. The sample complexity bound can be improved if a spectral method is used to provide a good initial guess. We complement our theoretical study with several numerical results showing that mirror descent is both a computationally and statistically efficient scheme to solve the phase retrieval problem.
- [253] arXiv:2405.10762 (cross-list from q-fin.RM) [pdf, ps, other]
-
Title: Research on Credit Risk Early Warning Model of Commercial Banks Based on Neural Network AlgorithmSubjects: Risk Management (q-fin.RM); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
In the realm of globalized financial markets, commercial banks are confronted with an escalating magnitude of credit risk, thereby imposing heightened requisites upon the security of bank assets and financial stability. This study harnesses advanced neural network techniques, notably the Backpropagation (BP) neural network, to pioneer a novel model for preempting credit risk in commercial banks. The discourse initially scrutinizes conventional financial risk preemptive models, such as ARMA, ARCH, and Logistic regression models, critically analyzing their real-world applications. Subsequently, the exposition elaborates on the construction process of the BP neural network model, encompassing network architecture design, activation function selection, parameter initialization, and objective function construction. Through comparative analysis, the superiority of neural network models in preempting credit risk in commercial banks is elucidated. The experimental segment selects specific bank data, validating the model's predictive accuracy and practicality. Research findings evince that this model efficaciously enhances the foresight and precision of credit risk management.
- [254] arXiv:2405.10763 (cross-list from cond-mat.dis-nn) [pdf, ps, html, other]
-
Title: Integer Traffic Assignment Problem: Algorithms and Insights on Random GraphsComments: 37 pages, 15 figuresSubjects: Disordered Systems and Neural Networks (cond-mat.dis-nn); Discrete Mathematics (cs.DM); Optimization and Control (math.OC); Computation (stat.CO)
Path optimization is a fundamental concern across various real-world scenarios, ranging from traffic congestion issues to efficient data routing over the internet. The Traffic Assignment Problem (TAP) is a classic continuous optimization problem in this field. This study considers the Integer Traffic Assignment Problem (ITAP), a discrete variant of TAP. ITAP involves determining optimal routes for commuters in a city represented by a graph, aiming to minimize congestion while adhering to integer flow constraints on paths. This restriction makes ITAP an NP-hard problem. While conventional TAP prioritizes repulsive interactions to minimize congestion, this work also explores the case of attractive interactions, related to minimizing the number of occupied edges. We present and evaluate multiple algorithms to address ITAP, including a message passing algorithm, a greedy approach, simulated annealing, and relaxation of ITAP to TAP. Inspired by studies of random ensembles in the large-size limit in statistical physics, comparisons between these algorithms are conducted on large sparse random regular graphs with a random set of origin-destination pairs. Our results indicate that while the simplest greedy algorithm performs competitively in the repulsive scenario, in the attractive case the message-passing-based algorithm and simulated annealing demonstrate superiority. We then investigate the relationship between TAP and ITAP in the repulsive case. We find that, as the number of paths increases, the solution of TAP converges toward that of ITAP, and we investigate the speed of this convergence. Depending on the number of paths, our analysis leads us to identify two scaling regimes: in one the average flow per edge is of order one, and in another the number of paths scales quadratically with the size of the graph, in which case the continuous relaxation solves the integer problem closely.
- [255] arXiv:2405.10780 (cross-list from eess.SP) [pdf, ps, other]
-
Title: Intelligent Neural Interfaces: An Emerging Era in NeurotechnologySubjects: Signal Processing (eess.SP); Hardware Architecture (cs.AR); Human-Computer Interaction (cs.HC); Machine Learning (cs.LG); Neurons and Cognition (q-bio.NC)
Integrating smart algorithms on neural devices presents significant opportunities for various brain disorders. In this paper, we review the latest advancements in the development of three categories of intelligent neural prostheses featuring embedded signal processing on the implantable or wearable device. These include: 1) Neural interfaces for closed-loop symptom tracking and responsive stimulation; 2) Neural interfaces for emerging network-related conditions, such as psychiatric disorders; and 3) Intelligent BMI SoCs for movement recovery following paralysis.
- [256] arXiv:2405.10782 (cross-list from physics.app-ph) [pdf, ps, other]
-
Title: Using oxides to compute with heatGuillaume F. Nataf (1), Sebastian Volz (2), Jose Ordonez-Miranda (2), Jorge Íñiguez-González (3), Riccardo Rurali (4), Brahim Dkhil (5) ((1) GREMAN UMR 7347, CNRS, University of Tours, INSA Centre Val de Loire, Tours, France, (2) LIMMS, CNRS-IIS IRL 2820, The University of Tokyo, Tokyo, Japan, (3) Materials Research and Technology Department, LIST, Esch-sur-Alzette, Luxembourg, (4) Institut de Ciència de Materials de Barcelona, ICMAB-CSIC, Campus UAB, Bellaterra, Spain, (5) Université Paris-Saclay, CentraleSupélec, CNRS-UMR8580, Laboratoire SPMS, Gif-sur-Yvette, France)Journal-ref: Nature Reviews Materials (2024)Subjects: Applied Physics (physics.app-ph); Emerging Technologies (cs.ET)
One of the most innovative possibilities offered by oxides is the use of heat currents for computational purposes. Towards this goal, phase-change oxides, including ferroelectrics, ferromagnets and related materials, could reproduce sources, logic units and memories used in current and future computing schemes.
- [257] arXiv:2405.10789 (cross-list from math.CO) [pdf, ps, html, other]
-
Title: On Minimal Transversals of Maximal Cliques in GraphsSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
A hypergraph is conformal if it is the family of maximal cliques of a graph. In this paper we are interested in the problem of determining when is the family of minimal transversal of maximal cliques of a graph conformal. Such graphs are called clique dually conformal (CDC for short). As our main results, we completely characterize CDC graphs within the families of triangle-free graphs and split graphs. Both characterizations lead to polynomial-time recognition algorithms. We also show that the class of CDC graphs is closed under substitution, in the strong sense that substituting a graph $H$ for a vertex of a graph $G$ results in a CDC graph if and only if both $G$ and $H$ are CDC.
- [258] arXiv:2405.10803 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: A Large-scale Multi Domain Leukemia Dataset for the White Blood Cells Detection with Morphological Attributes for ExplainabilityComments: Early AcceptSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Earlier diagnosis of Leukemia can save thousands of lives annually. The prognosis of leukemia is challenging without the morphological information of White Blood Cells (WBC) and relies on the accessibility of expensive microscopes and the availability of hematologists to analyze Peripheral Blood Samples (PBS). Deep Learning based methods can be employed to assist hematologists. However, these algorithms require a large amount of labeled data, which is not readily available. To overcome this limitation, we have acquired a realistic, generalized, and large dataset. To collect this comprehensive dataset for real-world applications, two microscopes from two different cost spectrums (high-cost HCM and low-cost LCM) are used for dataset capturing at three magnifications (100x, 40x, 10x) through different sensors (high-end camera for HCM, middle-level camera for LCM and mobile-phone camera for both). The high-sensor camera is 47 times more expensive than the middle-level camera and HCM is 17 times more expensive than LCM. In this collection, using HCM at high resolution (100x), experienced hematologists annotated 10.3k WBC types (14) and artifacts, having 55k morphological labels (Cell Size, Nuclear Chromatin, Nuclear Shape, etc.) from 2.4k images of several PBS leukemia patients. Later on, these annotations are transferred to other 2 magnifications of HCM, and 3 magnifications of LCM, and on each camera captured images. Along with the LeukemiaAttri dataset, we provide baselines over multiple object detectors and Unsupervised Domain Adaptation (UDA) strategies, along with morphological information-based attribute prediction. The dataset will be publicly available after publication to facilitate the research in this direction.
- [259] arXiv:2405.10812 (cross-list from q-bio.GN) [pdf, ps, html, other]
-
Title: VQDNA: Unleashing the Power of Vector Quantization for Multi-Species Genomic Sequence ModelingComments: ICML 2024. Preprint V1 with 16 pages and 5 figuresSubjects: Genomics (q-bio.GN); Artificial Intelligence (cs.AI)
Similar to natural language models, pre-trained genome language models are proposed to capture the underlying intricacies within genomes with unsupervised sequence modeling. They have become essential tools for researchers and practitioners in biology. However, the \textit{hand-crafted} tokenization policies used in these models may not encode the most discriminative patterns from the limited vocabulary of genomic data. In this paper, we introduce VQDNA, a general-purpose framework that renovates genome tokenization from the perspective of genome vocabulary learning. By leveraging vector-quantized codebook as \textit{learnable} vocabulary, VQDNA can adaptively tokenize genomes into \textit{pattern-aware} embeddings in an end-to-end manner. To further push its limits, we propose Hierarchical Residual Quantization (HRQ), where varying scales of codebooks are designed in a hierarchy to enrich the genome vocabulary in a coarse-to-fine manner. Extensive experiments on 32 genome datasets demonstrate VQDNA's superiority and favorable parameter efficiency compared to existing genome language models. Notably, empirical analysis of SARS-CoV-2 mutations reveals the fine-grained pattern awareness and biological significance of learned HRQ vocabulary, highlighting its untapped potential for broader applications in genomics.
- [260] arXiv:2405.10815 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: A Functional Model Method for Nonconvex Nonsmooth Conditional Stochastic OptimizationSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Machine Learning (stat.ML)
We consider stochastic optimization problems involving an expected value of a nonlinear function of a base random vector and a conditional expectation of another function depending on the base random vector, a dependent random vector, and the decision variables. We call such problems conditional stochastic optimization problems. They arise in many applications, such as uplift modeling, reinforcement learning, and contextual optimization. We propose a specialized single time-scale stochastic method for nonconvex constrained conditional stochastic optimization problems with a Lipschitz smooth outer function and a generalized differentiable inner function. In the method, we approximate the inner conditional expectation with a rich parametric model whose mean squared error satisfies a stochastic version of a Łojasiewicz condition. The model is used by an inner learning algorithm. The main feature of our approach is that unbiased stochastic estimates of the directions used by the method can be generated with one observation from the joint distribution per iteration, which makes it applicable to real-time learning. The directions, however, are not gradients or subgradients of any overall objective function. We prove the convergence of the method with probability one, using the method of differential inclusions and a specially designed Lyapunov function, involving a stochastic generalization of the Bregman distance. Finally, a numerical illustration demonstrates the viability of our approach.
- [261] arXiv:2405.10817 (cross-list from stat.ML) [pdf, ps, html, other]
-
Title: Restless Linear BanditsSubjects: Machine Learning (stat.ML); Information Theory (cs.IT); Machine Learning (cs.LG)
A more general formulation of the linear bandit problem is considered to allow for dependencies over time. Specifically, it is assumed that there exists an unknown $\mathbb{R}^d$-valued stationary $\varphi$-mixing sequence of parameters $(\theta_t,~t \in \mathbb{N})$ which gives rise to pay-offs. This instance of the problem can be viewed as a generalization of both the classical linear bandits with iid noise, and the finite-armed restless bandits. In light of the well-known computational hardness of optimal policies for restless bandits, an approximation is proposed whose error is shown to be controlled by the $\varphi$-dependence between consecutive $\theta_t$. An optimistic algorithm, called LinMix-UCB, is proposed for the case where $\theta_t$ has an exponential mixing rate. The proposed algorithm is shown to incur a sub-linear regret of $\mathcal{O}\left(\sqrt{d n\mathrm{polylog}(n) }\right)$ with respect to an oracle that always plays a multiple of $\mathbb{E}\theta_t$. The main challenge in this setting is to ensure that the exploration-exploitation strategy is robust against long-range dependencies. The proposed method relies on Berbee's coupling lemma to carefully select near-independent samples and construct confidence ellipsoids around empirical estimates of $\mathbb{E}\theta_t$.
- [262] arXiv:2405.10828 (cross-list from eess.SP) [pdf, ps, html, other]
-
Title: Analysis of Impulsive Interference in Digital Audio Broadcasting Systems in Electric VehiclesComments: 44th Symposium on Information Theory and Signal Processing in the Benelux (SITB 2024), Delft, the NetherlandsSubjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
Recently, new types of interference in electric vehicles (EVs), such as converters switching and/or battery chargers, have been found to degrade the performance of wireless digital transmission systems. Measurements show that such an interference is characterized by impulsive behavior and is widely varying in time. This paper uses recorded data from our EV testbed to analyze the impulsive interference in the digital audio broadcasting band. Moreover, we use our analysis to obtain a corresponding interference model. In particular, we studied the temporal characteristics of the interference and confirmed that its amplitude indeed exhibits an impulsive behavior. Our results show that impulsive events span successive received signal samples and thus indicate a bursty nature. To this end, we performed a data-driven modification of a well-established model for bursty impulsive interference, the Markov-Middleton model, to produce synthetic noise realization. We investigate the optimal symbol detector design based on the proposed model and show significant performance gains compared to the conventional detector based on the additive white Gaussian noise assumption.
- [263] arXiv:2405.10833 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Automatic segmentation of Organs at Risk in Head and Neck cancer patients from CT and MRI scansSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Background and purpose: Deep Learning (DL) has been widely explored for Organs at Risk (OARs) segmentation; however, most studies have focused on a single modality, either CT or MRI, not both simultaneously. This study presents a high-performing DL pipeline for segmentation of 30 OARs from MRI and CT scans of Head and Neck (H&N) cancer patients.
Materials and methods: Paired CT and MRI-T1 images from 42 H&N cancer patients alongside annotation for 30 OARs from the H&N OAR CT & MR segmentation challenge dataset were used to develop a segmentation pipeline. After cropping irrelevant regions, rigid followed by non-rigid registration of CT and MRI volumes was performed. Two versions of the CT volume, representing soft tissues and bone anatomy, were stacked with the MRI volume and used as input to an nnU-Net pipeline. Modality Dropout was used during the training to force the model to learn from the different modalities. Segmentation masks were predicted with the trained model for an independent set of 14 new patients. The mean Dice Score (DS) and Hausdorff Distance (HD) were calculated for each OAR across these patients to evaluate the pipeline.
Results: This resulted in an overall mean DS and HD of 0.777 +- 0.118 and 3.455 +- 1.679, respectively, establishing the state-of-the-art (SOTA) for this challenge at the time of submission.
Conclusion: The proposed pipeline achieved the best DS and HD among all participants of the H&N OAR CT and MR segmentation challenge and sets a new SOTA for automated segmentation of H&N OARs. - [264] arXiv:2405.10870 (cross-list from eess.IV) [pdf, ps, html, other]
-
Title: Multicenter Privacy-Preserving Model Training for Deep Learning Brain Metastases AutosegmentationYixing Huang, Zahra Khodabakhshi, Ahmed Gomaa, Manuel Schmidt, Rainer Fietkau, Matthias Guckenberger, Nicolaus Andratschke, Christoph Bert, Stephanie Tanadini-Lang, Florian PutzComments: Submission to the Green Journal (Major Revision)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Objectives: This work aims to explore the impact of multicenter data heterogeneity on deep learning brain metastases (BM) autosegmentation performance, and assess the efficacy of an incremental transfer learning technique, namely learning without forgetting (LWF), to improve model generalizability without sharing raw data.
Materials and methods: A total of six BM datasets from University Hospital Erlangen (UKER), University Hospital Zurich (USZ), Stanford, UCSF, NYU and BraTS Challenge 2023 on BM segmentation were used for this evaluation. First, the multicenter performance of a convolutional neural network (DeepMedic) for BM autosegmentation was established for exclusive single-center training and for training on pooled data, respectively. Subsequently bilateral collaboration was evaluated, where a UKER pretrained model is shared to another center for further training using transfer learning (TL) either with or without LWF.
Results: For single-center training, average F1 scores of BM detection range from 0.625 (NYU) to 0.876 (UKER) on respective single-center test data. Mixed multicenter training notably improves F1 scores at Stanford and NYU, with negligible improvement at other centers. When the UKER pretrained model is applied to USZ, LWF achieves a higher average F1 score (0.839) than naive TL (0.570) and single-center training (0.688) on combined UKER and USZ test data. Naive TL improves sensitivity and contouring accuracy, but compromises precision. Conversely, LWF demonstrates commendable sensitivity, precision and contouring accuracy. When applied to Stanford, similar performance was observed.
Conclusion: Data heterogeneity results in varying performance in BM autosegmentation, posing challenges to model generalizability. LWF is a promising approach to peer-to-peer privacy-preserving model training. - [265] arXiv:2405.10890 (cross-list from astro-ph.IM) [pdf, ps, html, other]
-
Title: A Versatile Framework for Analyzing Galaxy Image Data by Implanting Human-in-the-loop on a Large Vision ModelMingxiang Fu, Yu Song, Jiameng Lv, Liang Cao, Peng Jia, Nan Li, Xiangru Li, Jifeng Liu, A-Li Luo, Bo Qiu, Shiyin Shen, Liangping Tu, Lili Wang, Shoulin Wei, Haifeng Yang, Zhenping Yi, Zhiqiang ZouComments: 26 pages, 10 figures, to be published on Chinese Physics CSubjects: Instrumentation and Methods for Astrophysics (astro-ph.IM); Astrophysics of Galaxies (astro-ph.GA); Artificial Intelligence (cs.AI)
The exponential growth of astronomical datasets provides an unprecedented opportunity for humans to gain insight into the Universe. However, effectively analyzing this vast amount of data poses a significant challenge. Astronomers are turning to deep learning techniques to address this, but the methods are limited by their specific training sets, leading to considerable duplicate workloads too. Hence, as an example to present how to overcome the issue, we built a framework for general analysis of galaxy images, based on a large vision model (LVM) plus downstream tasks (DST), including galaxy morphological classification, image restoration, object detection, parameter extraction, and more. Considering the low signal-to-noise ratio of galaxy images and the imbalanced distribution of galaxy categories, we have incorporated a Human-in-the-loop (HITL) module into our large vision model, which leverages human knowledge to enhance the reliability and interpretability of processing galaxy images interactively. The proposed framework exhibits notable few-shot learning capabilities and versatile adaptability to all the abovementioned tasks on galaxy images in the DESI legacy imaging surveys. Expressly, for object detection, trained by 1000 data points, our DST upon the LVM achieves an accuracy of 96.7%, while ResNet50 plus Mask R-CNN gives an accuracy of 93.1%; for morphology classification, to obtain AUC ~0.9, LVM plus DST and HITL only requests 1/50 training sets compared to ResNet18. Expectedly, multimodal data can be integrated similarly, which opens up possibilities for conducting joint analyses with datasets spanning diverse domains in the era of multi-message astronomy.
- [266] arXiv:2405.10897 (cross-list from math.OC) [pdf, ps, html, other]
-
Title: Efficient Line Search Method Based on Regression and Uncertainty QuantificationComments: To be featured in LION18 2024Subjects: Optimization and Control (math.OC); Machine Learning (cs.LG)
Unconstrained optimization problems are typically solved using iterative methods, which often depend on line search techniques to determine optimal step lengths in each iteration. This paper introduces a novel line search approach. Traditional line search methods, aimed at determining optimal step lengths, often discard valuable data from the search process and focus on refining step length intervals. This paper proposes a more efficient method using Bayesian optimization, which utilizes all available data points, i.e., function values and gradients, to guide the search towards a potential global minimum. This new approach more effectively explores the search space, leading to better solution quality. It is also easy to implement and integrate into existing frameworks. Tested on the challenging CUTEst test set, it demonstrates superior performance compared to existing state-of-the-art methods, solving more problems to optimality with equivalent resource usage.
- [267] arXiv:2405.10916 (cross-list from math.AP) [pdf, ps, html, other]
-
Title: Nearly self-similar blowup of generalized axisymmetric Navier-Stokes and Boussinesq equationsComments: 34 pages, 28 figures. arXiv admin note: text overlap with arXiv:2107.06509Subjects: Analysis of PDEs (math.AP); Numerical Analysis (math.NA)
We perform numerical investigation of nearly self-similar blowup of generalized axisymmetric Navier-Stokes equations and Boussinesq system with a time-dependent fractional dimension. The dynamic change of the space dimension is proportional to the ratio R(t)/Z(t), where (R(t),Z(t)) is the position at which the maximum vorticity achieves its global maximum. This choice of space dimension is to ensure that the advection along the r-direction has the same scaling as that along the z-direction, thus preventing formation of two-scale solution structure. For the generalized axisymmetric Navier-Stokes equations with solution dependent viscosity, we show that the solution develops a self-similar blowup with dimension equal to 3.188 and the self-similar profile satisfies the axisymmetric Navier-Stokes equations with constant viscosity. We also study the nearly self-similar blowup of the axisymmetric Boussinesq system with constant viscosity. The generalized axisymmetric Boussinesq system preserves almost all the known properties of the 3D Navier-Stokes equations except for the conservation of angular momentum. We present convincing numerical evidence that the generalized axisymmetric Boussinesq system develops a stable nearly self-similar blowup solution with maximum vorticity increased by O(10^{30}).
- [268] arXiv:2405.10925 (cross-list from stat.ME) [pdf, ps, other]
-
Title: High-dimensional multiple imputation (HDMI) for partially observed confounders including natural language processing-derived auxiliary covariatesJanick Weberpals, Pamela A. Shaw, Kueiyu Joshua Lin, Richard Wyss, Joseph M Plasek, Li Zhou, Kerry Ngan, Thomas DeRamus, Sudha R. Raman, Bradley G. Hammill, Hana Lee, Sengwee Toh, John G. Connolly, Kimberly J. Dandreo, Fang Tian, Wei Liu, Jie Li, José J. Hernández-Muñoz, Sebastian Schneeweiss, Rishi J. DesaiSubjects: Methodology (stat.ME); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Multiple imputation (MI) models can be improved by including auxiliary covariates (AC), but their performance in high-dimensional data is not well understood. We aimed to develop and compare high-dimensional MI (HDMI) approaches using structured and natural language processing (NLP)-derived AC in studies with partially observed confounders. We conducted a plasmode simulation study using data from opioid vs. non-steroidal anti-inflammatory drug (NSAID) initiators (X) with observed serum creatinine labs (Z2) and time-to-acute kidney injury as outcome. We simulated 100 cohorts with a null treatment effect, including X, Z2, atrial fibrillation (U), and 13 other investigator-derived confounders (Z1) in the outcome generation. We then imposed missingness (MZ2) on 50% of Z2 measurements as a function of Z2 and U and created different HDMI candidate AC using structured and NLP-derived features. We mimicked scenarios where U was unobserved by omitting it from all AC candidate sets. Using LASSO, we data-adaptively selected HDMI covariates associated with Z2 and MZ2 for MI, and with U to include in propensity score models. The treatment effect was estimated following propensity score matching in MI datasets and we benchmarked HDMI approaches against a baseline imputation and complete case analysis with Z1 only. HDMI using claims data showed the lowest bias (0.072). Combining claims and sentence embeddings led to an improvement in the efficiency displaying the lowest root-mean-squared-error (0.173) and coverage (94%). NLP-derived AC alone did not perform better than baseline MI. HDMI approaches may decrease bias in studies with partially observed confounders where missingness depends on unobserved factors.
- [269] arXiv:2405.10930 (cross-list from stat.ML) [pdf, ps, other]
-
Title: Submodular Information Selection for Hypothesis Testing with Misclassification PenaltiesComments: 23 pages, 4 figuresSubjects: Machine Learning (stat.ML); Computational Complexity (cs.CC); Information Theory (cs.IT); Machine Learning (cs.LG); Optimization and Control (math.OC)
We consider the problem of selecting an optimal subset of information sources for a hypothesis testing/classification task where the goal is to identify the true state of the world from a finite set of hypotheses, based on finite observation samples from the sources. In order to characterize the learning performance, we propose a misclassification penalty framework, which enables non-uniform treatment of different misclassification errors. In a centralized Bayesian learning setting, we study two variants of the subset selection problem: (i) selecting a minimum cost information set to ensure that the maximum penalty of misclassifying the true hypothesis remains bounded and (ii) selecting an optimal information set under a limited budget to minimize the maximum penalty of misclassifying the true hypothesis. Under mild assumptions, we prove that the objective (or constraints) of these combinatorial optimization problems are weak (or approximate) submodular, and establish high-probability performance guarantees for greedy algorithms. Further, we propose an alternate metric for information set selection which is based on the total penalty of misclassification. We prove that this metric is submodular and establish near-optimal guarantees for the greedy algorithms for both the information set selection problems. Finally, we present numerical simulations to validate our theoretical results over several randomly generated instances.
- [270] arXiv:2405.10933 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Learning low-degree quantum objectsComments: 26+4 pagesSubjects: Quantum Physics (quant-ph); Computational Complexity (cs.CC); Data Structures and Algorithms (cs.DS); Machine Learning (cs.LG); Functional Analysis (math.FA)
We consider the problem of learning low-degree quantum objects up to $\varepsilon$-error in $\ell_2$-distance. We show the following results: $(i)$ unknown $n$-qubit degree-$d$ (in the Pauli basis) quantum channels and unitaries can be learned using $O(1/\varepsilon^d)$ queries (independent of $n$), $(ii)$ polynomials $p:\{-1,1\}^n\rightarrow [-1,1]$ arising from $d$-query quantum algorithms can be classically learned from $O((1/\varepsilon)^d\cdot \log n)$ many random examples $(x,p(x))$ (which implies learnability even for $d=O(\log n)$), and $(iii)$ degree-$d$ polynomials $p:\{-1,1\}^n\to [-1,1]$ can be learned through $O(1/\varepsilon^d)$ queries to a quantum unitary $U_p$ that block-encodes $p$. Our main technical contributions are new Bohnenblust-Hille inequalities for quantum channels and completely bounded~polynomials.
- [271] arXiv:2405.10941 (cross-list from quant-ph) [pdf, ps, html, other]
-
Title: Variational Quantum Algorithm Landscape Reconstruction by Low-Rank Tensor CompletionSubjects: Quantum Physics (quant-ph); Hardware Architecture (cs.AR); Emerging Technologies (cs.ET)
Variational quantum algorithms (VQAs) are a broad class of algorithms with many applications in science and industry. Applying a VQA to a problem involves optimizing a parameterized quantum circuit by maximizing or minimizing a cost function. A particular challenge associated with VQAs is understanding the properties of associated cost functions. Having the landscapes of VQA cost functions can greatly assist in developing and testing new variational quantum algorithms, but they are extremely expensive to compute. Reconstructing the landscape of a VQA using existing techniques requires a large number of cost function evaluations, especially when the dimension or the resolution of the landscape is high. To address this challenge, we propose a low-rank tensor-completion-based approach for local landscape reconstruction. By leveraging compact low-rank representations of tensors, our technique can overcome the curse of dimensionality and handle high-resolution landscapes. We demonstrate the power of landscapes in VQA development by showcasing practical applications of analyzing penalty terms for constrained optimization problems and examining the probability landscapes of certain basis states.
- [272] arXiv:2405.10944 (cross-list from physics.chem-ph) [pdf, ps, html, other]
-
Title: Probabilistic transfer learning methodology to expedite high fidelity simulation of reactive flowsSubjects: Chemical Physics (physics.chem-ph); Machine Learning (cs.LG)
Reduced order models based on the transport of a lower dimensional manifold representation of the thermochemical state, such as Principal Component (PC) transport and Machine Learning (ML) techniques, have been developed to reduce the computational cost associated with the Direct Numerical Simulations (DNS) of reactive flows. Both PC transport and ML normally require an abundance of data to exhibit sufficient predictive accuracy, which might not be available due to the prohibitive cost of DNS or experimental data acquisition. To alleviate such difficulties, similar data from an existing dataset or domain (source domain) can be used to train ML models, potentially resulting in adequate predictions in the domain of interest (target domain). This study presents a novel probabilistic transfer learning (TL) framework to enhance the trust in ML models in correctly predicting the thermochemical state in a lower dimensional manifold and a sparse data setting. The framework uses Bayesian neural networks, and autoencoders, to reduce the dimensionality of the state space and diffuse the knowledge from the source to the target domain. The new framework is applied to one-dimensional freely-propagating flame solutions under different data sparsity scenarios. The results reveal that there is an optimal amount of knowledge to be transferred, which depends on the amount of data available in the target domain and the similarity between the domains. TL can reduce the reconstruction error by one order of magnitude for cases with large sparsity. The new framework required 10 times less data for the target domain to reproduce the same error as in the abundant data scenario. Furthermore, comparisons with a state-of-the-art deterministic TL strategy show that the probabilistic method can require four times less data to achieve the same reconstruction error.
Cross submissions for Monday, 20 May 2024 (showing 47 of 47 entries )
- [273] arXiv:2012.02030 (replaced) [pdf, ps, html, other]
-
Title: Data-Informed Global Sparseness in Attention Mechanisms for Deep Neural NetworksComments: Presented at LREC-COLING 2024: 12 pages, 4 figures, 11 tablesSubjects: Computation and Language (cs.CL)
Attention mechanisms play a crucial role in the neural revolution of Natural Language Processing (NLP). With the growth of attention-based models, several pruning techniques have been developed to identify and exploit sparseness, making these models more efficient. Most efforts focus on hard-coding attention patterns or pruning attention weights based on training data. We propose Attention Pruning (AP), a framework that observes attention patterns in a fixed dataset and generates a global sparseness mask. AP saves 90% of attention computation for language modeling and about 50% for machine translation and GLUE tasks, maintaining result quality. Our method reveals important distinctions between self- and cross-attention patterns, guiding future NLP research. Our framework can reduce both latency and memory requirements for any attention-based model, aiding in the development of improved models for existing or new NLP applications. We have demonstrated this with encoder and autoregressive transformer models using Triton GPU kernels and make our code publicly available at this https URL.
- [274] arXiv:2112.03242 (replaced) [pdf, ps, html, other]
-
Title: Aspect Ratio Universal Rectangular LayoutsComments: 25 pages, 12 figures, full version of a 12-page extended abstract to appear in WALCOM 2022Subjects: Computational Geometry (cs.CG); Combinatorics (math.CO)
A \emph{generic rectangular layout} (for short, \emph{layout}) is a subdivision of an axis-aligned rectangle into axis-aligned rectangles, no four of which have a point in common. Such layouts are used in data visualization and in cartography. The contacts between the rectangles represent semantic or geographic relations. A layout is weakly (strongly) \emph{aspect ratio universal} if any assignment of aspect ratios to rectangles can be realized by a weakly (strongly) equivalent layout. We give combinatorial characterizations for weakly and strongly aspect ratio universal layouts. Furthermore, we describe a quadratic-time algorithm that decides whether a given graph is the dual graph of a strongly aspect ratio universal layout, and finds such a layout if one exists.
- [275] arXiv:2202.12361 (replaced) [pdf, ps, html, other]
-
Title: RescueNet: A High Resolution UAV Semantic Segmentation Benchmark Dataset for Natural Disaster Damage AssessmentJournal-ref: Sci Data 10, 913 (2023)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Recent advancements in computer vision and deep learning techniques have facilitated notable progress in scene understanding, thereby assisting rescue teams in achieving precise damage assessment. In this paper, we present RescueNet, a meticulously curated high-resolution post-disaster dataset that includes detailed classification and semantic segmentation annotations. This dataset aims to facilitate comprehensive scene understanding in the aftermath of natural disasters. RescueNet comprises post-disaster images collected after Hurricane Michael, obtained using Unmanned Aerial Vehicles (UAVs) from multiple impacted regions. The uniqueness of RescueNet lies in its provision of high-resolution post-disaster imagery, accompanied by comprehensive annotations for each image. Unlike existing datasets that offer annotations limited to specific scene elements such as buildings, RescueNet provides pixel-level annotations for all classes, including buildings, roads, pools, trees, and more. Furthermore, we evaluate the utility of the dataset by implementing state-of-the-art segmentation models on RescueNet, demonstrating its value in enhancing existing methodologies for natural disaster damage assessment.
- [276] arXiv:2204.13666 (replaced) [pdf, ps, html, other]
-
Title: Schr\"odinger's FP: Dynamic Adaptation of Floating-Point Containers for Deep Learning TrainingMiloš Nikolić, Enrique Torres Sanchez, Jiahui Wang, Ali Hadi Zadeh, Mostafa Mahmoud, Ameer Abdelhadi, Kareem Ibrahim, Andreas MoshovosSubjects: Machine Learning (cs.LG); Hardware Architecture (cs.AR)
The transfer of tensors from/to memory during neural network training dominates time and energy. To improve energy efficiency and performance, research has been exploring ways to use narrower data representations. So far, these attempts relied on user-directed trial-and-error to achieve convergence. We present methods that relieve users from this responsibility. Our methods dynamically adjust the size and format of the floating-point containers used for activations and weights during training, achieving adaptivity across three dimensions: i) which datatype to use, ii) on which tensor, and iii) how it changes over time. The different meanings and distributions of exponent and mantissas lead us to tailored approaches for each. We present two lossy pairs of methods to eliminate as many mantissa and exponent bits as possible without affecting accuracy. Quantum Mantissa and Quantum Exponent are machine learning compression methods that tap into the gradient descent algorithm to learn the minimal mantissa and exponent bitlengths on a per-layer granularity. They automatically learn that many tensors can use just 1 or 2 mantissa bits and 3 or 4 exponent bits. Overall, the two machine learning methods reduce the footprint by $4.74\times$. Alternatively, BitWave observes changes in the loss function during training to adjust mantissa and exponent bitlengths network-wide, yielding a $3.19\times$ reduction in footprint. Finally, we present an optional method, Gecko, to exploit the naturally emerging, lop-sided exponent distribution to losslessly compress resulting exponents from Quantum Exponent or BitWave and, on average, improve compression rates to $5.64\times$ and $4.56\times$.
- [277] arXiv:2206.05850 (replaced) [pdf, ps, html, other]
-
Title: Achieving Zero Constraint Violation for Constrained Reinforcement Learning via Conservative Natural Policy Gradient Primal-Dual AlgorithmComments: The latest version fixed the error in the proof of Lemma 4 in AAAI2023Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Systems and Control (eess.SY)
We consider the problem of constrained Markov decision process (CMDP) in continuous state-actions spaces where the goal is to maximize the expected cumulative reward subject to some constraints. We propose a novel Conservative Natural Policy Gradient Primal-Dual Algorithm (C-NPG-PD) to achieve zero constraint violation while achieving state of the art convergence results for the objective value function. For general policy parametrization, we prove convergence of value function to global optimal upto an approximation error due to restricted policy class. We even improve the sample complexity of existing constrained NPG-PD algorithm \cite{Ding2020} from $\mathcal{O}(1/\epsilon^6)$ to $\mathcal{O}(1/\epsilon^4)$. To the best of our knowledge, this is the first work to establish zero constraint violation with Natural policy gradient style algorithms for infinite horizon discounted CMDPs. We demonstrate the merits of proposed algorithm via experimental evaluations.
- [278] arXiv:2207.12855 (replaced) [pdf, ps, html, other]
-
Title: Efficient Learning of Accurate Surrogates for Simulations of Complex SystemsComments: 13 pages, 6 figures, submitted to Nature Machine IntelligenceSubjects: Machine Learning (cs.LG); Nuclear Theory (nucl-th); Computational Physics (physics.comp-ph); Data Analysis, Statistics and Probability (physics.data-an); Plasma Physics (physics.plasm-ph)
Machine learning methods are increasingly used to build computationally inexpensive surrogates for complex physical models. The predictive capability of these surrogates suffers when data are noisy, sparse, or time-dependent. As we are interested in finding a surrogate that provides valid predictions of any potential future model evaluations, we introduce an online learning method empowered by optimizer-driven sampling. The method has two advantages over current approaches. First, it ensures that all turning points on the model response surface are included in the training data. Second, after any new model evaluations, surrogates are tested and "retrained" (updated) if the "score" drops below a validity threshold. Tests on benchmark functions reveal that optimizer-directed sampling generally outperforms traditional sampling methods in terms of accuracy around local extrema, even when the scoring metric favors overall accuracy. We apply our method to simulations of nuclear matter to demonstrate that highly accurate surrogates for the nuclear equation of state can be reliably auto-generated from expensive calculations using a few model evaluations.
- [279] arXiv:2209.00840 (replaced) [pdf, ps, html, other]
-
Title: FOLIO: Natural Language Reasoning with First-Order LogicSimeng Han, Hailey Schoelkopf, Yilun Zhao, Zhenting Qi, Martin Riddell, Wenfei Zhou, James Coady, David Peng, Yujie Qiao, Luke Benson, Lucy Sun, Alex Wardle-Solano, Hannah Szabo, Ekaterina Zubova, Matthew Burtell, Jonathan Fan, Yixin Liu, Brian Wong, Malcolm Sailor, Ansong Ni, Linyong Nan, Jungo Kasai, Tao Yu, Rui Zhang, Alexander R. Fabbri, Wojciech Kryscinski, Semih Yavuz, Ye Liu, Xi Victoria Lin, Shafiq Joty, Yingbo Zhou, Caiming Xiong, Rex Ying, Arman Cohan, Dragomir RadevSubjects: Computation and Language (cs.CL)
Large language models (LLMs) have achieved remarkable performance on a variety of natural language understanding tasks. However, existing benchmarks are inadequate in measuring the complex logical reasoning capabilities of a model. We present FOLIO, a human-annotated, logically complex and diverse dataset for reasoning in natural language (NL), equipped with first-order logic (FOL) annotations. FOLIO consists of 1,430 examples (unique conclusions), each paired with one of 487 sets of premises used to deductively reason for the validity of each conclusion. The logical correctness of the premises and conclusions is ensured by their FOL annotations, which are automatically verified by an FOL inference engine. In addition to the main NL reasoning task, NL-FOL pairs in FOLIO constitute a new NL-FOL translation dataset. Our experiments on FOLIO systematically evaluate the FOL reasoning ability of supervised fine-tuning on medium-sized language models. For both NL reasoning and NL-FOL translation, we benchmark multiple state-of-the-art language models. Our results show that a subset of FOLIO presents a challenge for one of the most capable {Large Language Model (LLM)} publicly available, GPT-4.
- [280] arXiv:2210.07376 (replaced) [pdf, ps, other]
-
Title: ScionFL: Efficient and Robust Secure Quantized AggregationYaniv Ben-Itzhak, Helen Möllering, Benny Pinkas, Thomas Schneider, Ajith Suresh, Oleksandr Tkachenko, Shay Vargaftik, Christian Weinert, Hossein Yalame, Avishay YanaiComments: Published in 2024 IEEE Conference on Secure and Trustworthy Machine Learning (SaTML)Subjects: Cryptography and Security (cs.CR); Information Theory (cs.IT); Machine Learning (cs.LG)
Secure aggregation is commonly used in federated learning (FL) to alleviate privacy concerns related to the central aggregator seeing all parameter updates in the clear. Unfortunately, most existing secure aggregation schemes ignore two critical orthogonal research directions that aim to (i) significantly reduce client-server communication and (ii) mitigate the impact of malicious clients. However, both of these additional properties are essential to facilitate cross-device FL with thousands or even millions of (mobile) participants.
In this paper, we unite both research directions by introducing ScionFL, the first secure aggregation framework for FL that operates efficiently on quantized inputs and simultaneously provides robustness against malicious clients. Our framework leverages (novel) multi-party computation (MPC) techniques and supports multiple linear (1-bit) quantization schemes, including ones that utilize the randomized Hadamard transform and Kashin's representation.
Our theoretical results are supported by extensive evaluations. We show that with no overhead for clients and moderate overhead for the server compared to transferring and processing quantized updates in plaintext, we obtain comparable accuracy for standard FL benchmarks. Moreover, we demonstrate the robustness of our framework against state-of-the-art poisoning attacks. - [281] arXiv:2211.14309 (replaced) [pdf, ps, html, other]
-
Title: FutureHuman3D: Forecasting Complex Long-Term 3D Human Behavior from Video ObservationsSubjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We present a generative approach to forecast long-term future human behavior in 3D, requiring only weak supervision from readily available 2D human action data. This is a fundamental task enabling many downstream applications. The required ground-truth data is hard to capture in 3D (mocap suits, expensive setups) but easy to acquire in 2D (simple RGB cameras). Thus, we design our method to only require 2D RGB data at inference time while being able to generate 3D human motion sequences. We use a differentiable 2D projection scheme in an autoregressive manner for weak supervision, and an adversarial loss for 3D regularization. Our method predicts long and complex human behavior sequences (e.g., cooking, assembly) consisting of multiple sub-actions. We tackle this in a semantically hierarchical manner, jointly predicting high-level coarse action labels together with their low-level fine-grained realizations as characteristic 3D human poses. We observe that these two action representations are coupled in nature, and joint prediction benefits both action and pose forecasting. Our experiments demonstrate the complementary nature of joint action and 3D pose prediction: our joint approach outperforms each task treated individually, enables robust longer-term sequence prediction, and improves over alternative approaches to forecast actions and characteristic 3D poses.
- [282] arXiv:2301.00802 (replaced) [pdf, ps, html, other]
-
Title: Deep Clustering of Tabular Data by Weighted Gaussian Distribution LearningSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Deep learning methods are primarily proposed for supervised learning of images or text with limited applications to clustering problems. In contrast, tabular data with heterogeneous features pose unique challenges in representation learning, where deep learning has yet to replace traditional machine learning. This paper addresses these challenges in developing one of the first deep clustering methods for tabular data: Gaussian Cluster Embedding in Autoencoder Latent Space (G-CEALS). G-CEALS is an unsupervised deep clustering framework for learning the parameters of multivariate Gaussian cluster distributions by iteratively updating individual cluster weights. The G-CEALS method presents average rank orderings of 2.9(1.7) and 2.8(1.7) based on clustering accuracy and adjusted Rand index (ARI) scores on sixteen tabular data sets, respectively, and outperforms nine state-of-the-art clustering methods. G-CEALS substantially improves clustering performance compared to traditional K-means and GMM, which are still de facto methods for clustering tabular data. Similar computationally efficient and high-performing deep clustering frameworks are imperative to reap the myriad benefits of deep learning on tabular data over traditional machine learning.
- [283] arXiv:2301.05084 (replaced) [pdf, ps, html, other]
-
Title: Local consistency as a reduction between constraint satisfaction problemsSubjects: Logic in Computer Science (cs.LO); Computational Complexity (cs.CC)
We study the use of local consistency methods as reductions between constraint satisfaction problems (CSPs), and promise version thereof, with the aim to classify these reductions in a similar way as the algebraic approach classifies gadget reductions between CSPs. This research is motivated by the requirement of more expressive reductions in the scope of promise CSPs. While gadget reductions are enough to provide all necessary hardness in the scope of (finite domain) non-promise CSP, in promise CSPs a wider class of reductions needs to be used.
We provide a general framework of reductions, which we call consistency reductions, that covers most (if not all) reductions recently used for proving NP-hardness of promise CSPs. We prove some basic properties of these reductions, and provide the first steps towards understanding the power of consistency reductions by characterizing a fragment associated to arc-consistency in terms of polymorphisms of the template. In addition to showing hardness, consistency reductions can also be used to provide feasible algorithms by reducing to a fixed tractable (promise) CSP, for example, to solving systems of affine equations. In this direction, among other results, we describe the well-known Sherali-Adams hierarchy for CSP in terms of a consistency reduction to linear programming. - [284] arXiv:2302.05402 (replaced) [pdf, ps, html, other]
-
Title: A Survey on Cyber-Resilience Approaches for Cyber-Physical SystemsComments: ACM Computing Surveys, 56(8):1--37, 36 pages, 2 figures, 1 tableSubjects: Cryptography and Security (cs.CR)
Concerns for the resilience of Cyber-Physical Systems (CPS)s in critical infrastructure are growing. CPS integrate sensing, computation, control, and networking into physical objects and mission-critical services, connecting traditional infrastructure to internet technologies. While this integration increases service efficiency, it has to face the possibility of new threats posed by the new functionalities. This leads to cyber-threats, such as denial-of-service, modification of data, information leakage, spreading of malware, and many others. Cyber-resilience refers to the ability of a CPS to prepare, absorb, recover, and adapt to the adverse effects associated with cyber-threats, e.g., physical degradation of the CPS performance resulting from a cyber-attack. Cyber-resilience aims at ensuring CPS survival by keeping the core functionalities of the CPS in case of extreme events. The literature on cyber-resilience is rapidly increasing, leading to a broad variety of research works addressing this new topic. In this article, we create a systematization of knowledge about existing scientific efforts of making CPSs cyber-resilient. We systematically survey recent literature addressing cyber-resilience with a focus on techniques that may be used on CPSs. We first provide preliminaries and background on CPSs and threats, and subsequently survey state-of-the-art approaches that have been proposed by recent research work applicable to CPSs. In particular, we aim at differentiating research work from traditional risk management approaches based on the general acceptance that it is unfeasible to prevent and mitigate all possible risks threatening a CPS. We also discuss questions and research challenges, with a focus on the practical aspects of cyber-resilience, such as the use of metrics and evaluation methods as well as testing and validation environments.
- [285] arXiv:2303.10833 (replaced) [pdf, ps, other]
-
Title: Linear Codes Constructed From Two Weakly Regular Plateaued Functions with Index (p-1)/2Comments: 35 pagesSubjects: Information Theory (cs.IT)
Linear codes are the most important family of codes in cryptography and coding theory. Some codes have only a few weights and are widely used in many areas, such as authentication codes, secret sharing schemes and strongly regular graphs. By setting $ p\equiv 1 \pmod 4 $, we construct an infinite family of linear codes using two distinct weakly regular unbalanced (and balanced) plateaued functions with index $ (p-1)/2 $. Their weight distributions are completely determined by applying exponential sums and Walsh transform. As a result, most of our constructed codes have a few nonzero weights and are minimal.
- [286] arXiv:2303.12307 (replaced) [pdf, ps, html, other]
-
Title: Predicting and Enhancing the Fairness of DNNs with the Curvature of Perceptual ManifoldsYanbiao Ma, Licheng Jiao, Fang Liu, Maoji Wen, Lingling Li, Wenping Ma, Shuyuan Yang, Xu Liu, Puhua ChenComments: 17pages, Accepted by CVPR 2023, Submitted to TPAMISubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
To address the challenges of long-tailed classification, researchers have proposed several approaches to reduce model bias, most of which assume that classes with few samples are weak classes. However, recent studies have shown that tail classes are not always hard to learn, and model bias has been observed on sample-balanced datasets, suggesting the existence of other factors that affect model bias. In this work, we first establish a geometric perspective for analyzing model fairness and then systematically propose a series of geometric measurements for perceptual manifolds in deep neural networks. Subsequently, we comprehensively explore the effect of the geometric characteristics of perceptual manifolds on classification difficulty and how learning shapes the geometric characteristics of perceptual manifolds. An unanticipated finding is that the correlation between the class accuracy and the separation degree of perceptual manifolds gradually decreases during training, while the negative correlation with the curvature gradually increases, implying that curvature imbalance leads to model bias.Building upon these observations, we propose curvature regularization to facilitate the model to learn curvature-balanced and flatter perceptual manifolds. Evaluations on multiple long-tailed and non-long-tailed datasets show the excellent performance and exciting generality of our approach, especially in achieving significant performance improvements based on current state-of-the-art techniques. Our work opens up a geometric analysis perspective on model bias and reminds researchers to pay attention to model bias on non-long-tailed and even sample-balanced datasets.
- [287] arXiv:2303.12557 (replaced) [pdf, ps, html, other]
-
Title: Q-HyViT: Post-Training Quantization of Hybrid Vision Transformers with Bridge Block Reconstruction for IoT SystemsComments: 14 pages, 9 figures, accepted in IEEE Internet of Things JournalSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Recently, vision transformers (ViTs) have superseded convolutional neural networks in numerous applications, including classification, detection, and segmentation. However, the high computational requirements of ViTs hinder their widespread implementation. To address this issue, researchers have proposed efficient hybrid transformer architectures that combine convolutional and transformer layers with optimized attention computation of linear complexity. Additionally, post-training quantization has been proposed as a means of mitigating computational demands. For mobile devices, achieving optimal acceleration for ViTs necessitates the strategic integration of quantization techniques and efficient hybrid transformer structures. However, no prior investigation has applied quantization to efficient hybrid transformers. In this paper, we discover that applying existing post-training quantization (PTQ) methods for ViTs to efficient hybrid transformers leads to a drastic accuracy drop, attributed to the four following challenges: (i) highly dynamic ranges, (ii) zero-point overflow, (iii) diverse normalization, and (iv) limited model parameters ($<$5M). To overcome these challenges, we propose a new post-training quantization method, which is the first to quantize efficient hybrid ViTs (MobileViTv1, MobileViTv2, Mobile-Former, EfficientFormerV1, EfficientFormerV2). We achieve a significant improvement of 17.73% for 8-bit and 29.75% for 6-bit on average, respectively, compared with existing PTQ methods (EasyQuant, FQ-ViT, PTQ4ViT, and RepQ-ViT)}. We plan to release our code at this https URL.
- [288] arXiv:2304.12751 (replaced) [pdf, ps, other]
-
Title: Node Feature Augmentation Vitaminizes Network AlignmentComments: 18 pages, 12 figures, 5 tables; its conference version was presented at the ACM International Conference on Information and Knowledge Management (CIKM 2022)Subjects: Social and Information Networks (cs.SI); Information Theory (cs.IT); Machine Learning (cs.LG); Neural and Evolutionary Computing (cs.NE); Networking and Internet Architecture (cs.NI)
Network alignment (NA) is the task of discovering node correspondences across multiple networks. Although NA methods have achieved remarkable success in a myriad of scenarios, their effectiveness is not without additional information such as prior anchor links and/or node features, which may not always be available due to privacy concerns or access restrictions. To tackle this challenge, we propose Grad-Align+, a novel NA method built upon a recent state-of-the-art NA method, the so-called Grad-Align, that gradually discovers a part of node pairs until all node pairs are found. In designing Grad-Align+, we account for how to augment node features in the sense of performing the NA task and how to design our NA method by maximally exploiting the augmented node features. To achieve this goal, Grad-Align+ consists of three key components: 1) centrality-based node feature augmentation (CNFA), 2) graph neural network (GNN)-aided embedding similarity calculation alongside the augmented node features, and 3) gradual NA with similarity calculation using aligned cross-network neighbor-pairs (ACNs). Through comprehensive experiments, we demonstrate that Grad-Align+ exhibits (a) the superiority over benchmark NA methods, (b) empirical validations as well as our theoretical findings to see the effectiveness of CNFA, (c) the influence of each component, (d) the robustness to network noises, and (e) the computational efficiency.
- [289] arXiv:2305.05994 (replaced) [pdf, ps, html, other]
-
Title: ANALOGYKB: Unlocking Analogical Reasoning of Language Models with A Million-scale Knowledge BaseComments: Accepted to ACL 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Analogical reasoning is a fundamental cognitive ability of humans. However, current language models (LMs) still struggle to achieve human-like performance in analogical reasoning tasks due to a lack of resources for model training. In this work, we address this gap by proposing ANALOGYKB, a million-scale analogy knowledge base (KB) derived from existing knowledge graphs (KGs). ANALOGYKB identifies two types of analogies from the KGs: 1) analogies of the same relations, which can be directly extracted from the KGs, and 2) analogies of analogous relations, which are identified with a selection and filtering pipeline enabled by large language models (LLMs), followed by minor human efforts for data quality control. Evaluations on a series of datasets of two analogical reasoning tasks (analogy recognition and generation) demonstrate that ANALOGYKB successfully enables both smaller LMs and LLMs to gain better analogical reasoning capabilities.
- [290] arXiv:2305.09441 (replaced) [pdf, ps, html, other]
-
Title: STLCCP: An Efficient Convex Optimization-based Framework for Signal Temporal Logic SpecificationsComments: 17 pagesSubjects: Systems and Control (eess.SY); Formal Languages and Automata Theory (cs.FL); Robotics (cs.RO)
Signal Temporal Logic (STL) is capable of expressing a broad range of temporal properties that controlled dynamical systems must satisfy. In the literature, both mixed-integer programming (MIP) and nonlinear programming (NLP) methods have been applied to solve optimal control problems with STL specifications. However, neither approach has succeeded in solving problems with complex long-horizon STL specifications within a realistic timeframe. This study proposes a new optimization framework, called \textit{STLCCP}, which explicitly incorporates several structures of STL to mitigate this issue. The core of our framework is a structure-aware decomposition of STL formulas, which converts the original program into a difference of convex (DC) programs. This program is then solved as a convex quadratic program sequentially, based on the convex-concave procedure (CCP). Our numerical experiments on several commonly used benchmarks demonstrate that this framework can effectively handle complex scenarios over long horizons, which have been challenging to address even using state-of-the-art optimization methods.
- [291] arXiv:2305.10611 (replaced) [pdf, ps, html, other]
-
Title: ACRoBat: Optimizing Auto-batching of Dynamic Deep Learning at Compile TimeSubjects: Machine Learning (cs.LG)
Dynamic control flow is an important technique often used to design expressive and efficient deep learning computations for applications such as text parsing, machine translation, exiting early out of deep models and so on. The control flow divergence resulting from dynamic control flow makes batching, an important optimization enabling high throughput and hardware utilization, difficult to perform manually. In this paper, we present ACRoBat, a framework that enables efficient automatic batching for dynamic deep learning computations by performing hybrid static+dynamic compiler optimizations and end-to-end tensor code generation. ACRoBat performs up to 8.5X better than DyNet, a state-of-the-art framework for automatic batching, on an Nvidia GeForce GPU.
- [292] arXiv:2306.02040 (replaced) [pdf, ps, html, other]
-
Title: Getting More by Knowing Less: Bayesian Incentive Compatible Mechanisms for Fair DivisionComments: IJCAI 2024, 27 pagesSubjects: Computer Science and Game Theory (cs.GT)
We study fair resource allocation with strategic agents. It is well-known that, across multiple fundamental problems in this domain, truthfulness and fairness are incompatible. For example, when allocating indivisible goods, no truthful and deterministic mechanism can guarantee envy-freeness up to one item (EF1), even for two agents with additive valuations. Or, in cake-cutting, no truthful and deterministic mechanism always outputs a proportional allocation, even for two agents with piecewise constant valuations. Our work stems from the observation that, in the context of fair division, truthfulness is used as a synonym for Dominant Strategy Incentive Compatibility (DSIC), requiring that an agent prefers reporting the truth, no matter what other agents report.
In this paper, we instead focus on Bayesian Incentive Compatible (BIC) mechanisms, requiring that agents are better off reporting the truth in expectation over other agents' reports. We prove that, when agents know a bit less about each other, a lot more is possible: BIC mechanisms can guarantee fairness notions that are unattainable by DSIC mechanisms in both the fundamental problems of allocation of indivisible goods and cake-cutting. We prove that this is the case even for an arbitrary number of agents, as long as the agents' priors about each others' types satisfy a neutrality condition. Notably, for the case of indivisible goods, we significantly strengthen the state-of-the-art negative result for efficient DSIC mechanisms, while also highlighting the limitations of BIC mechanisms, by showing that a very general class of welfare objectives is incompatible with Bayesian Incentive Compatibility. Combined these results give a near-complete picture of the power and limitations of BIC and DSIC mechanisms for the problem of allocating indivisible goods. - [293] arXiv:2306.05889 (replaced) [pdf, ps, html, other]
-
Title: C(NN)FD -- a deep learning framework for turbomachinery CFD analysisJournal-ref: IEEE Transactions on Industrial Informatics (2024)Subjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Fluid Dynamics (physics.flu-dyn)
Deep Learning methods have seen a wide range of successful applications across different industries. Up until now, applications to physical simulations such as CFD (Computational Fluid Dynamics), have been limited to simple test-cases of minor industrial relevance. This paper demonstrates the development of a novel deep learning framework for real-time predictions of the impact of manufacturing and build variations on the overall performance of axial compressors in gas turbines, with a focus on tip clearance variations. The associated scatter in efficiency can significantly increase the CO2 emissions, thus being of great industrial and environmental relevance. The proposed C(NN)FD architecture achieves in real-time accuracy comparable to the CFD benchmark. Predicting the flow field and using it to calculate the corresponding overall performance renders the methodology generalisable, while filtering only relevant parts of the CFD solution makes the methodology scalable to industrial applications.
- [294] arXiv:2306.06276 (replaced) [pdf, ps, other]
-
Title: Contrastive Learning for Predicting Cancer Prognosis Using Gene Expression ValuesSubjects: Machine Learning (cs.LG)
Recent advancements in image classification have demonstrated that contrastive learning (CL) can aid in further learning tasks by acquiring good feature representation from a limited number of data samples. In this paper, we applied CL to tumor transcriptomes and clinical data to learn feature representations in a low-dimensional space. We then utilized these learned features to train a classifier to categorize tumors into a high- or low-risk group of recurrence. Using data from The Cancer Genome Atlas (TCGA), we demonstrated that CL can significantly improve classification accuracy. Specifically, our CL-based classifiers achieved an area under the receiver operating characteristic curve (AUC) greater than 0.8 for 14 types of cancer, and an AUC greater than 0.9 for 2 types of cancer. We also developed CL-based Cox (CLCox) models for predicting cancer prognosis. Our CLCox models trained with the TCGA data outperformed existing methods significantly in predicting the prognosis of 19 types of cancer under consideration. The performance of CLCox models and CL-based classifiers trained with TCGA lung and prostate cancer data were validated using the data from two independent cohorts. We also show that the CLCox model trained with the whole transcriptome significantly outperforms the Cox model trained with the 21 genes of Oncotype DX that is in clinical use for breast cancer patients. CL-based classifiers and CLCox models for 19 types of cancer are publicly available and can be used to predict cancer prognosis using the RNA-seq transcriptome of an individual tumor. Python codes for model training and testing are also publicly accessible, and can be applied to train new CL-based models using gene expression data of tumors.
- [295] arXiv:2306.08722 (replaced) [pdf, ps, html, other]
-
Title: Learning to Stabilize High-dimensional Unknown Systems Using Lyapunov-guided ExplorationComments: 32 pages, 7 figures; Accepted by the 6th Annual Conference on Learning for Dynamics and Control (L4DC 2024)Subjects: Systems and Control (eess.SY)
Designing stabilizing controllers is a fundamental challenge in autonomous systems, particularly for high-dimensional, nonlinear systems that can hardly be accurately modeled with differential equations. The Lyapunov theory offers a solution for stabilizing control systems, still, current methods relying on Lyapunov functions require access to complete dynamics or samples of system executions throughout the entire state space. Consequently, they are impractical for high-dimensional systems. This paper introduces a novel framework, LYapunov-Guided Exploration (LYGE), for learning stabilizing controllers tailored to high-dimensional, unknown systems. LYGE employs Lyapunov theory to iteratively guide the search for samples during exploration while simultaneously learning the local system dynamics, control policy, and Lyapunov functions. We demonstrate its scalability on highly complex systems, including a high-fidelity F-16 jet model featuring a 16D state space and a 4D input space. Experiments indicate that, compared to prior works in reinforcement learning, imitation learning, and neural certificates, LYGE reduces the distance to the goal by 50% while requiring only 5% to 32% of the samples. Furthermore, we demonstrate that our algorithm can be extended to learn controllers guided by other certificate functions for unknown systems.
- [296] arXiv:2307.04577 (replaced) [pdf, ps, html, other]
-
Title: AnyTeleop: A General Vision-Based Dexterous Robot Arm-Hand Teleoperation SystemComments: this http URL Robotics: Science and Systems 2023Subjects: Robotics (cs.RO); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Vision-based teleoperation offers the possibility to endow robots with human-level intelligence to physically interact with the environment, while only requiring low-cost camera sensors. However, current vision-based teleoperation systems are designed and engineered towards a particular robot model and deploy environment, which scales poorly as the pool of the robot models expands and the variety of the operating environment increases. In this paper, we propose AnyTeleop, a unified and general teleoperation system to support multiple different arms, hands, realities, and camera configurations within a single system. Although being designed to provide great flexibility to the choice of simulators and real hardware, our system can still achieve great performance. For real-world experiments, AnyTeleop can outperform a previous system that was designed for a specific robot hardware with a higher success rate, using the same robot. For teleoperation in simulation, AnyTeleop leads to better imitation learning performance, compared with a previous system that is particularly designed for that simulator. Project page: this https URL.
- [297] arXiv:2308.05042 (replaced) [pdf, ps, other]
-
Title: An Electrical Grid with Discrete Energy LevelsComments: 15 pages, 7 figuresSubjects: Systems and Control (eess.SY)
Minimizing both power fluctuations and energy waste in an electrical grid is a central challenge to energy policy. Any discrepancy between power production and loads may lead to inefficiencies and instability in the system. Right now, the electrical grid is an analog system that only retroactively reacts to power demands. The balancing act becomes even harder with the penetration of sustainable resources (e.g., wind turbines). Here, we consider the effect of random perturbations to the grid's steady states operation. A model is constructed and analyzed within the framework of a randomly perturbed Markovian chain. Instead of balancing continuous values for supply and demand, the model assumes that both the generators and the users adhere to discrete pattern energy levels which is supplemented by local, short-term energy storage units (STESU). Under reasonable assumptions, we show that this grid maintains stability (meaning, a constant difference between supply and demand) over long periods of time while subjected to randomly fluctuating energy conditions.
- [298] arXiv:2308.07476 (replaced) [pdf, ps, html, other]
-
Title: Dependent rounding with strong negative-correlation, and scheduling on unrelated machines to minimize completion timeJournal-ref: SODA 2024Subjects: Data Structures and Algorithms (cs.DS)
We describe a new dependent-rounding algorithmic framework for bipartite graphs. Given a fractional assignment $\vec x$ of values to edges of graph $G = (U \cup V, E)$, the algorithms return an integral solution $\vec X$ such that each right-node $v \in V$ has at most one neighboring edge $f$ with $X_f = 1$, and where the variables $X_e$ also satisfy broad nonpositive-correlation properties. In particular, for any edges $e_1, e_2$ sharing a left-node $u \in U$, the variables $X_{e_1}, X_{e_2}$ have strong negative-correlation properties, i.e. the expectation of $X_{e_1} X_{e_2}$ is significantly below $x_{e_1} x_{e_2}$.
This algorithm is based on generating negatively-correlated Exponential random variables and using them in a contention-resolution scheme inspired by an algorithm Im & Shadloo (2020). Our algorithm gives stronger and much more flexible negative correlation properties.
Dependent rounding schemes with negative correlation properties have been used for approximation algorithms for job-scheduling on unrelated machines to minimize weighted completion times (Bansal, Srinivasan, & Svensson (2021), Im & Shadloo (2020), Im & Li (2023)). Using our new dependent-rounding algorithm, among other improvements, we obtain a $1.398$-approximation for this problem. This significantly improves over the prior $1.45$-approximation ratio of Im & Li (2023). - [299] arXiv:2308.11056 (replaced) [pdf, ps, html, other]
-
Title: Closeness and Residual Closeness of Harary GraphsComments: 23 pages preprintSubjects: Discrete Mathematics (cs.DM)
Analysis of a network in terms of vulnerability is one of the most significant problems. Graph theory serves as a valuable tool for solving complex network problems, and there exist numerous graph-theoretic parameters to analyze the system's stability. Among these parameters, the closeness parameter stands out as one of the most commonly used vulnerability metrics. Its definition has evolved to enhance the ease of formulation and applicability to disconnected structures. Furthermore, based on the closeness parameter, vertex residual closeness, which is a newer and more sensitive parameter compared to other existing parameters, has been introduced as a new graph vulnerability index by Dangalchev. In this study, the outcomes of the closeness and vertex residual closeness parameters in Harary Graphs have been examined. Harary Graphs are well-known constructs that are distinguished by having $n$ vertices that are $k$-connected with the least possible number of edges.
- [300] arXiv:2308.11786 (replaced) [pdf, ps, html, other]
-
Title: Building Better Human-Agent Teams: Balancing Human Resemblance and Contribution in Voice AssistantsComments: 12 pages, 4 figuresSubjects: Human-Computer Interaction (cs.HC)
Voice assistants are increasingly prevalent, from personal devices to team environments. This study explores how voice type and contribution quality influence human-agent team performance and perceptions of anthropomorphism, animacy, intelligence, and trustworthiness. By manipulating both, we reveal mechanisms of perception and clarify ambiguity in previous work. Our results show that the human resemblance of a voice assistant's voice negatively interacts with the helpfulness of an agent's contribution to flip its effect on perceived anthropomorphism and perceived animacy. This means human teammates interpret the agent's contributions differently depending on its voice. Our study found no significant effect of voice on perceived intelligence, trustworthiness, or team performance. We find differences in these measures are caused by manipulating the helpfulness of an agent. These findings suggest that function matters more than form when designing agents for high-performing human-agent teams, but controlling perceptions of anthropomorphism and animacy can be unpredictable even with high human resemblance.
- [301] arXiv:2308.12577 (replaced) [pdf, ps, html, other]
-
Title: REB: Reducing Biases in Representation for Industrial Anomaly DetectionComments: 14 pages, 7 figures, 7 tablesSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Existing representation-based methods usually conduct industrial anomaly detection in two stages: obtain feature representations with a pre-trained model and perform distance measures for anomaly detection. Among them, K-nearest neighbor (KNN) retrieval-based anomaly detection methods show promising results. However, the features are not fully exploited as these methods ignore domain bias of pre-trained models and the difference of local density in feature space, which limits the detection performance. In this paper, we propose Reducing Biases (REB) in representation by considering the domain bias and building a self-supervised learning task for better domain adaption with a defect generation strategy (DefectMaker) that ensures a strong diversity in the synthetic defects. Additionally, we propose a local-density KNN (LDKNN) to reduce the local density bias in the feature space and obtain effective anomaly detection. The proposed REB method achieves a promising result of 99.5\% Im.AUROC on the widely used MVTec AD, with smaller backbone networks such as Vgg11 and Resnet18. The method also achieves an impressive 88.8\% Im.AUROC on the MVTec LOCO AD dataset and a remarkable 96.0\% on the BTAD dataset, outperforming other representation-based approaches. These results indicate the effectiveness and efficiency of REB for practical industrial applications. Code:this https URL.
- [302] arXiv:2309.02967 (replaced) [pdf, ps, html, other]
-
Title: Reviving Static Charts into Live ChartsComments: Accepted by IEEE TVCGSubjects: Human-Computer Interaction (cs.HC)
Data charts are prevalent across various fields due to their efficacy in conveying complex data relationships. However, static charts may sometimes struggle to engage readers and efficiently present intricate information, potentially resulting in limited understanding. We introduce "Live Charts," a new format of presentation that decomposes complex information within a chart and explains the information pieces sequentially through rich animations and accompanying audio narration. We propose an automated approach to revive static charts into Live Charts. Our method integrates GNN-based techniques to analyze the chart components and extract data from charts. Then we adopt large natural language models to generate appropriate animated visuals along with a voice-over to produce Live Charts from static ones. We conducted a thorough evaluation of our approach, which involved the model performance, use cases, a crowd-sourced user study, and expert interviews. The results demonstrate Live Charts offer a multi-sensory experience where readers can follow the information and understand the data insights better. We analyze the benefits and drawbacks of Live Charts over static charts as a new information consumption experience.
- [303] arXiv:2309.05075 (replaced) [pdf, ps, html, other]
-
Title: Secure Set-Based State Estimation for Linear Systems under Adversarial Attacks on SensorsSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Set-based state estimation plays a vital role in the safety verification of dynamical systems, which becomes significantly challenging when the system's sensors are susceptible to cyber-attacks. Existing methods often impose limitations on the attacker's capabilities, restricting the number of attacked sensors to be strictly less than half of the total number of sensors. This paper proposes a Secure Set-Based State Estimation (S3E) algorithm that addresses this limitation. The S3E algorithm guarantees that the true system state is contained within the estimated set, provided the initialization set encompasses the true initial state and the system is redundantly observable from the set of uncompromised sensors. The algorithm gives the estimated set as a collection of constrained zonotopes, which can be employed as robust certificates for verifying whether the system adheres to safety constraints. Furthermore, we demonstrate that the estimated set remains unaffected by attack signals of sufficiently large and also establish sufficient conditions for attack detection, identification, and filtering. This compels the attacker to inject only stealthy signals of small magnitude to evade detection, thus preserving the accuracy of the estimated set. When a few number of sensors (less than half) can be compromised, we prove that the estimated set remains bounded by a contracting set that converges to a ball whose radius is solely determined by the noise magnitude and is independent of the attack signals. To address the computational complexity of the algorithm, we offer several strategies for complexity-performance trade-offs. The efficacy of the proposed algorithm is illustrated through its application to a three-story building model.
- [304] arXiv:2309.05516 (replaced) [pdf, ps, html, other]
-
Title: Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Large Language Models (LLMs) have demonstrated exceptional proficiency in language-related tasks. However, their deployment presents significant challenges due to their substantial memory and storage requirements. To address this challenge, weight-only quantization has emerged as a promising solution. Previous research has indicated that fine-tuning through up and down rounding can enhance performance. In this study, we introduce SignRound, a method that utilizes signed gradient descent (SignSGD) to optimize rounding values and weight clipping within just 200 steps, combining the strengths of both Quantization-Aware Training (QAT) and Post-Training Quantization (PTQ). SignRound achieves outstanding results compared to recent methods across 2 to 4 bits, while maintaining low tuning costs and without introducing any additional inference overhead. For instance, SignRound led to absolute average accuracy improvements ranging from 6.91\% to 33.22\% at 2 bits. Furthermore, it demonstrates robust generalization to various recent models and achieves near-lossless quantization in most scenarios at 4 bits. The source code is publicly available at \url{this https URL}.
- [305] arXiv:2309.09231 (replaced) [pdf, ps, other]
-
Title: ATM: a Logic for Quantitative Security Properties on Attack TreesSubjects: Cryptography and Security (cs.CR); Logic in Computer Science (cs.LO)
Critical infrastructure systems - for which high reliability and availability are paramount - must operate securely. Attack trees (ATs) are hierarchical diagrams that offer a flexible modelling language used to assess how systems can be attacked. ATs are widely employed both in industry and academia but - in spite of their popularity - little work has been done to give practitioners instruments to formulate queries on ATs in an understandable yet powerful way. In this paper we fill this gap by presenting ATM, a logic to express quantitative security properties on ATs. ATM allows for the specification of properties involved with security metrics that include "cost", "probability" and "skill" and permits the formulation of insightful what-if scenarios. To showcase its potential, we apply ATM to the case study of a CubeSAT, presenting three different ways in which an attacker can compromise its availability. We showcase property specification on the corresponding attack tree and we present theory and algorithms - based on binary decision diagrams - to check properties and compute metrics of ATM-formulae.
- [306] arXiv:2309.10621 (replaced) [pdf, ps, html, other]
-
Title: Large language models can accurately predict searcher preferencesSubjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Relevance labels, which indicate whether a search result is valuable to a searcher, are key to evaluating and optimising search systems. The best way to capture the true preferences of users is to ask them for their careful feedback on which results would be useful, but this approach does not scale to produce a large number of labels. Getting relevance labels at scale is usually done with third-party labellers, who judge on behalf of the user, but there is a risk of low-quality data if the labeller doesn't understand user needs. To improve quality, one standard approach is to study real users through interviews, user studies and direct feedback, find areas where labels are systematically disagreeing with users, then educate labellers about user needs through judging guidelines, training and monitoring. This paper introduces an alternate approach for improving label quality. It takes careful feedback from real users, which by definition is the highest-quality first-party gold data that can be derived, and develops an large language model prompt that agrees with that data.
We present ideas and observations from deploying language models for large-scale relevance labelling at Bing, and illustrate with data from TREC. We have found large language models can be effective, with accuracy as good as human labellers and similar capability to pick the hardest queries, best runs, and best groups. Systematic changes to the prompts make a difference in accuracy, but so too do simple paraphrases. To measure agreement with real searchers needs high-quality "gold" labels, but with these we find that models produce better labels than third-party workers, for a fraction of the cost, and these labels let us train notably better rankers. - [307] arXiv:2309.10771 (replaced) [pdf, ps, html, other]
-
Title: Redefining Qualitative Analysis in the AI Era: Utilizing ChatGPT for Efficient Thematic AnalysisSubjects: Human-Computer Interaction (cs.HC)
AI tools, particularly large-scale language model (LLM) based applications such as ChatGPT, have the potential to simplify qualitative research. Through semi-structured interviews with seventeen participants, we identified challenges and concerns in integrating ChatGPT into the qualitative analysis process. Collaborating with thirteen qualitative researchers, we developed a framework for designing prompts to enhance the effectiveness of ChatGPT in thematic analysis. Our findings indicate that improving transparency, providing guidance on prompts, and strengthening users' understanding of LLMs' capabilities significantly enhance the users' ability to interact with ChatGPT. We also discovered and revealed the reasons behind researchers' shift in attitude towards ChatGPT from negative to positive. This research not only highlights the importance of well-designed prompts in LLM applications but also offers reflections for qualitative researchers on the perception of AI's role. Finally, we emphasize the potential ethical risks and the impact of constructing AI ethical expectations by researchers, particularly those who are novices, on future research and AI development.
- [308] arXiv:2309.15817 (replaced) [pdf, ps, other]
-
Title: Identifying the Risks of LM Agents with an LM-Emulated SandboxYangjun Ruan, Honghua Dong, Andrew Wang, Silviu Pitis, Yongchao Zhou, Jimmy Ba, Yann Dubois, Chris J. Maddison, Tatsunori HashimotoSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Recent advances in Language Model (LM) agents and tool use, exemplified by applications like ChatGPT Plugins, enable a rich set of capabilities but also amplify potential risks - such as leaking private data or causing financial losses. Identifying these risks is labor-intensive, necessitating implementing the tools, setting up the environment for each test scenario manually, and finding risky cases. As tools and agents become more complex, the high cost of testing these agents will make it increasingly difficult to find high-stakes, long-tailed risks. To address these challenges, we introduce ToolEmu: a framework that uses an LM to emulate tool execution and enables the testing of LM agents against a diverse range of tools and scenarios, without manual instantiation. Alongside the emulator, we develop an LM-based automatic safety evaluator that examines agent failures and quantifies associated risks. We test both the tool emulator and evaluator through human evaluation and find that 68.8% of failures identified with ToolEmu would be valid real-world agent failures. Using our curated initial benchmark consisting of 36 high-stakes tools and 144 test cases, we provide a quantitative risk analysis of current LM agents and identify numerous failures with potentially severe outcomes. Notably, even the safest LM agent exhibits such failures 23.9% of the time according to our evaluator, underscoring the need to develop safer LM agents for real-world deployment.
- [309] arXiv:2310.01217 (replaced) [pdf, ps, html, other]
-
Title: ScaLearn: Simple and Highly Parameter-Efficient Task Transfer by Learning to ScaleComments: Accepted to Findings of the ACL: ACL 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Multi-task learning (MTL) has shown considerable practical benefits, particularly when using language models (LMs). While this is commonly achieved by learning $n$ tasks under a joint optimization procedure, some methods, such as AdapterFusion, divide the problem into two stages: (i) task learning, where knowledge specific to a task is encapsulated within sets of parameters (e.g., adapters), and (ii) transfer, where this already learned knowledge is leveraged for a target task. This separation of concerns provides numerous benefits (e.g., promoting reusability). However, current two-stage MTL introduces a substantial number of additional parameters. We address this issue by leveraging the usefulness of linearly scaling the output representations of source adapters for transfer learning. We introduce ScaLearn, a simple and highly parameter-efficient two-stage MTL method that capitalizes on the knowledge of the source tasks by learning a minimal set of scaling parameters that enable effective transfer to a target task. Our experiments on three benchmarks (GLUE, SuperGLUE, and HumSet) and two encoder LMs show that ScaLearn consistently outperforms strong baselines with a small number of transfer parameters (~ $0.35$% of those of AdapterFusion). Remarkably, we observe that ScaLearn maintains its strong abilities even when further reducing parameters, achieving competitive results with only $8$ transfer parameters per target task. Our proposed approach thus demonstrates the power of simple scaling as a promise for more efficient task transfer.
- [310] arXiv:2310.05872 (replaced) [pdf, ps, html, other]
-
Title: ViCor: Bridging Visual Understanding and Commonsense Reasoning with Large Language ModelsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
In our work, we explore the synergistic capabilities of pre-trained vision-and-language models (VLMs) and large language models (LLMs) on visual commonsense reasoning (VCR) problems. We find that VLMs and LLMs-based decision pipelines are good at different kinds of VCR problems. Pre-trained VLMs exhibit strong performance for problems involving understanding the literal visual content, which we noted as visual commonsense understanding (VCU). For problems where the goal is to infer conclusions beyond image content, which we noted as visual commonsense inference (VCI), VLMs face difficulties, while LLMs, given sufficient visual evidence, can use commonsense to infer the answer well. We empirically validate this by letting LLMs classify VCR problems into these two categories and show the significant difference between VLM and LLM with image caption decision pipelines on two subproblems. Moreover, we identify a challenge with VLMs' passive perception, which may miss crucial context information, leading to incorrect reasoning by LLMs. Based on these, we suggest a collaborative approach, named ViCor, where pre-trained LLMs serve as problem classifiers to analyze the problem category, then either use VLMs to answer the question directly or actively instruct VLMs to concentrate on and gather relevant visual elements to support potential commonsense inferences. We evaluate our framework on two VCR benchmark datasets and outperform all other methods that do not require in-domain fine-tuning.
- [311] arXiv:2310.08779 (replaced) [pdf, ps, other]
-
Title: A Completeness Theorem for Probabilistic Regular ExpressionsComments: Accepted for publication at LICS. Full version of the paper containing omitted proofsSubjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL)
We introduce Probabilistic Regular Expressions (PRE), a probabilistic analogue of regular expressions denoting probabilistic languages in which every word is assigned a probability of being generated. We present and prove the completeness of an inference system for reasoning about probabilistic language equivalence of PRE based on Salomaa's axiomatisation of Kleene Algebra.
- [312] arXiv:2310.08949 (replaced) [pdf, ps, html, other]
-
Title: EasyGen: Easing Multimodal Generation with BiDiffuser and LLMsComments: Accepted by ACL 2024, main conferenceSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV)
We present EasyGen, an efficient model designed to enhance multimodal understanding and generation by harnessing the capabilities of diffusion models and large language models (LLMs), Unlike existing multimodal models that predominately depend on encoders like CLIP or ImageBind and need ample amounts of training data to bridge modalities,EasyGen leverages BiDiffuser,a bidirectional conditional diffusion model, to foster more efficient modality interactions. Easygen achieves text generation by training a projection layer linking BiDiffuser and an LLM, and facilities image generation by training an adapter to align the LLM's text space with the BiDiffuser's image space, Comprehensive quantitative and qualitative experiments show that EasyGen excels in data-efficient training, high-quality image generation, and extendibility, effectively addressing the challenges in multimodal generation. The source code is available at this https URL.
- [313] arXiv:2310.09319 (replaced) [pdf, ps, html, other]
-
Title: Topological Data Analysis in smart manufacturingComments: Preprint still under reviewSubjects: Machine Learning (cs.LG); Algebraic Topology (math.AT); Applications (stat.AP)
Topological Data Analysis (TDA) is a discipline that applies algebraic topology techniques to analyze complex, multi-dimensional data. Although it is a relatively new field, TDA has been widely and successfully applied across various domains, such as medicine, materials science, and biology. This survey provides an overview of the state of the art of TDA within a dynamic and promising application area: industrial manufacturing and production, particularly within the Industry 4.0 context. We have conducted a rigorous and reproducible literature search focusing on TDA applications in industrial production and manufacturing settings. The identified works are categorized based on their application areas within the manufacturing process and the types of input data. We highlight the principal advantages of TDA tools in this context, address the challenges encountered and the future potential of the field. Furthermore, we identify TDA methods that are currently underexploited in specific industrial areas and discuss how their application could be beneficial, with the aim of stimulating further research in this field. This work seeks to bridge the theoretical advancements in TDA with the practical needs of industrial production. Our goal is to serve as a guide for practitioners and researchers applying TDA in industrial production and manufacturing systems. We advocate for the untapped potential of TDA in this domain and encourage continued exploration and research.
- [314] arXiv:2310.09542 (replaced) [pdf, ps, html, other]
-
Title: The Treewidth Boundedness Problem for an Inductive Separation Logic of RelationsSubjects: Logic in Computer Science (cs.LO); Formal Languages and Automata Theory (cs.FL)
The treewidth boundedness problem for a logic asks for the existence of an upper bound on the treewidth of the models of a given formula in that logic. This problem is found to be undecidable for first order logic. We consider a generalization of Separation Logic over relational signatures, interpreted over standard relational structures, and describe an algorithm for the treewidth boundedness problem in the context of this logic.
- [315] arXiv:2310.10155 (replaced) [pdf, ps, html, other]
-
Title: Analysis and implementation of nanotargeting on LinkedIn based on publicly available non-PIIComments: 19 pages, 10 figuresJournal-ref: Proceedings of the CHI Conference on Human Factors in Computing Systems May 2024 Article No 1019 Pages 1 to 22Subjects: Social and Information Networks (cs.SI)
The literature has shown that combining a few non-Personal Identifiable Information (non-PII) is enough to make a user unique in a dataset including millions of users. This work demonstrates that a combination of a few non-PII items can be activated to nanotarget users. We demonstrate that the combination of the location and {5} rare ({13} random) skills in a LinkedIn profile is enough to become unique in a user base of {$\sim$970M} users with a probability of 75\%. The novelty is that these attributes are publicly accessible to anyone registered on LinkedIn and can be activated through advertising campaigns. We ran an experiment configuring ad campaigns using the location and skills of three of the paper's authors, demonstrating how all the ads using $\geq13$ skills were delivered exclusively to the targeted user. We reported this vulnerability to LinkedIn, which initially ignored the problem, but fixed it as of November 2023.%This nanotargeting may expose LinkedIn users to privacy and security risks such as malvertising or manipulation.
- [316] arXiv:2310.10962 (replaced) [pdf, ps, html, other]
-
Title: Large Language Models can Contrastively Refine their Generation for Better Sentence Representation LearningComments: NAACL 2024Subjects: Computation and Language (cs.CL)
Recently, large language models (LLMs) have emerged as a groundbreaking technology and their unparalleled text generation capabilities have sparked interest in their application to the fundamental sentence representation learning task. Existing methods have explored utilizing LLMs as data annotators to generate synthesized data for training contrastive learning based sentence embedding models such as SimCSE. However, since contrastive learning models are sensitive to the quality of sentence pairs, the effectiveness of these methods is largely influenced by the content generated from LLMs, highlighting the need for more refined generation in the context of sentence representation learning. Building upon this premise, we propose MultiCSR, a multi-level contrastive sentence representation learning framework that decomposes the process of prompting LLMs to generate a corpus for training base sentence embedding models into three stages (i.e., sentence generation, sentence pair construction, in-batch training) and refines the generated content at these three distinct stages, ensuring only high-quality sentence pairs are utilized to train a base contrastive learning model. Our extensive experiments reveal that MultiCSR enables a less advanced LLM to surpass the performance of ChatGPT, while applying it to ChatGPT achieves better state-of-the-art results. Comprehensive analyses further underscore the potential of our framework in various application scenarios and achieving better sentence representation learning with LLMs.
- [317] arXiv:2310.15887 (replaced) [pdf, ps, html, other]
-
Title: AdaptiX -- A Transitional XR Framework for Development and Evaluation of Shared Control Applications in Assistive RoboticsComments: Proceedings of the ACM on Human-Computer Interaction (PACM HCI) as submission of the 16th ACM SIGCHI Symposium on Engineering Interactive Computing Systems (EICS'24)Subjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI); Robotics (cs.RO); Software Engineering (cs.SE)
With the ongoing efforts to empower people with mobility impairments and the increase in technological acceptance by the general public, assistive technologies, such as collaborative robotic arms, are gaining popularity. Yet, their widespread success is limited by usability issues, specifically the disparity between user input and software control along the autonomy continuum. To address this, shared control concepts provide opportunities to combine the targeted increase of user autonomy with a certain level of computer assistance. This paper presents the free and open-source AdaptiX XR framework for developing and evaluating shared control applications in a high-resolution simulation environment. The initial framework consists of a simulated robotic arm with an example scenario in Virtual Reality (VR), multiple standard control interfaces, and a specialized recording/replay system. AdaptiX can easily be extended for specific research needs, allowing Human-Robot Interaction (HRI) researchers to rapidly design and test novel interaction methods, intervention strategies, and multi-modal feedback techniques, without requiring an actual physical robotic arm during the early phases of ideation, prototyping, and evaluation. Also, a Robot Operating System (ROS) integration enables the controlling of a real robotic arm in a PhysicalTwin approach without any simulation-reality gap. Here, we review the capabilities and limitations of AdaptiX in detail and present three bodies of research based on the framework. AdaptiX can be accessed at this https URL.
- [318] arXiv:2311.00801 (replaced) [pdf, ps, html, other]
-
Title: GIST: Generated Inputs Sets Transferability in Deep LearningComments: accepted for publication in the "ACM Transactions on Software Engineering and Methodology" journalSubjects: Machine Learning (cs.LG); Software Engineering (cs.SE)
To foster the verifiability and testability of Deep Neural Networks (DNN), an increasing number of methods for test case generation techniques are being developed.
When confronted with testing DNN models, the user can apply any existing test generation technique. However, it needs to do so for each technique and each DNN model under test, which can be expensive. Therefore, a paradigm shift could benefit this testing process: rather than regenerating the test set independently for each DNN model under test, we could transfer from existing DNN models.
This paper introduces GIST (Generated Inputs Sets Transferability), a novel approach for the efficient transfer of test sets. Given a property selected by a user (e.g., neurons covered, faults), GIST enables the selection of good test sets from the point of view of this property among available test sets. This allows the user to recover similar properties on the transferred test sets as he would have obtained by generating the test set from scratch with a test cases generation technique. Experimental results show that GIST can select effective test sets for the given property to transfer. Moreover, GIST scales better than reapplying test case generation techniques from scratch on DNN models under test. - [319] arXiv:2311.03682 (replaced) [pdf, ps, html, other]
-
Title: Incentive Design for Eco-driving in Urban Transportation NetworksSubjects: Systems and Control (eess.SY); Social and Information Networks (cs.SI); Optimization and Control (math.OC)
Eco-driving emerges as a cost-effective and efficient strategy to mitigate greenhouse gas emissions in urban transportation networks. Acknowledging the persuasive influence of incentives in shaping driver behavior, this paper presents the `eco-planner,' a digital platform devised to promote eco-driving practices in urban transportation. At the outset of their trips, users provide the platform with their trip details and travel time preferences, enabling the eco-planner to formulate personalized eco-driving recommendations and corresponding incentives, while adhering to its budgetary constraints. Upon trip completion, incentives are transferred to users who comply with the recommendations and effectively reduce their emissions. By comparing our proposed incentive mechanism with a baseline scheme that offers uniform incentives to all users, we demonstrate that our approach achieves superior emission reductions and increased user compliance with a smaller budget.
- [320] arXiv:2311.04797 (replaced) [pdf, ps, other]
-
Title: CloverLeaf on Intel Multi-Core CPUs: A Case Study in Write-Allocate EvasionComments: 19 pages including artifact appendix; 11 figures, 1 table; numerous corrections, esp. in Table 1Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
In this paper we analyze the MPI-only version of the CloverLeaf code from the SPEChpc 2021 benchmark suite on recent Intel Xeon "Ice Lake" and "Sapphire Rapids" server CPUs. We observe peculiar breakdowns in performance when the number of processes is prime. Investigating this effect, we create first-principles data traffic models for each of the stencil-like hotspot loops. With application measurements and microbenchmarks to study memory data traffic behavior, we can connect the breakdowns to SpecI2M, a new write-allocate evasion feature in current Intel CPUs. For serial and full-node cases we are able to predict the memory data volume analytically with an error of a few percent. We find that if the number of processes is prime, SpecI2M fails to work properly, which we can attribute to short inner loops emerging from the one-dimensional domain decomposition in this case. We can also rule out other possible causes of the prime number effect, such as breaking layer conditions, MPI communication overhead, and load imbalance.
- [321] arXiv:2311.04818 (replaced) [pdf, ps, html, other]
-
Title: Cross-Silo Federated Learning Across Divergent Domains with Iterative Parameter AlignmentComments: Published at IEEE Big Data 2023Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV); Distributed, Parallel, and Cluster Computing (cs.DC)
Learning from the collective knowledge of data dispersed across private sources can provide neural networks with enhanced generalization capabilities. Federated learning, a method for collaboratively training a machine learning model across remote clients, achieves this by combining client models via the orchestration of a central server. However, current approaches face two critical limitations: i) they struggle to converge when client domains are sufficiently different, and ii) current aggregation techniques produce an identical global model for each client. In this work, we address these issues by reformulating the typical federated learning setup: rather than learning a single global model, we learn N models each optimized for a common objective. To achieve this, we apply a weighted distance minimization to model parameters shared in a peer-to-peer topology. The resulting framework, Iterative Parameter Alignment, applies naturally to the cross-silo setting, and has the following properties: (i) a unique solution for each participant, with the option to globally converge each model in the federation, and (ii) an optional early-stopping mechanism to elicit fairness among peers in collaborative learning settings. These characteristics jointly provide a flexible new framework for iteratively learning from peer models trained on disparate datasets. We find that the technique achieves competitive results on a variety of data partitions compared to state-of-the-art approaches. Further, we show that the method is robust to divergent domains (i.e. disjoint classes across peers) where existing approaches struggle.
- [322] arXiv:2311.07389 (replaced) [pdf, ps, html, other]
-
Title: Transpose Attack: Stealing Datasets with Bidirectional TrainingComments: NDSS24 paper, Transpose Attack, Transposed Model. NDSS version: this https URLSubjects: Machine Learning (cs.LG); Cryptography and Security (cs.CR)
Deep neural networks are normally executed in the forward direction. However, in this work, we identify a vulnerability that enables models to be trained in both directions and on different tasks. Adversaries can exploit this capability to hide rogue models within seemingly legitimate models. In addition, in this work we show that neural networks can be taught to systematically memorize and retrieve specific samples from datasets. Together, these findings expose a novel method in which adversaries can exfiltrate datasets from protected learning environments under the guise of legitimate models. We focus on the data exfiltration attack and show that modern architectures can be used to secretly exfiltrate tens of thousands of samples with high fidelity, high enough to compromise data privacy and even train new models. Moreover, to mitigate this threat we propose a novel approach for detecting infected models.
- [323] arXiv:2311.09149 (replaced) [pdf, ps, html, other]
-
Title: Temporal Knowledge Question Answering via Abstract Reasoning InductionComments: Accepted by ACL 2024. 17 pages, 10 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
In this study, we address the challenge of enhancing temporal knowledge reasoning in Large Language Models (LLMs). LLMs often struggle with this task, leading to the generation of inaccurate or misleading responses. This issue mainly arises from their limited ability to handle evolving factual knowledge and complex temporal logic. To overcome these limitations, we propose Abstract Reasoning Induction (ARI) framework, which divides temporal reasoning into two distinct phases: Knowledge-agnostic and Knowledge-based. This framework offers factual knowledge support to LLMs while minimizing the incorporation of extraneous noisy data. Concurrently, informed by the principles of constructivism, ARI provides LLMs the capability to engage in proactive, self-directed learning from both correct and incorrect historical reasoning samples. By teaching LLMs to actively construct knowledge and methods, it can significantly boosting their temporal reasoning abilities. Our approach achieves remarkable improvements, with relative gains of 29.7% and 9.27% on two temporal QA datasets, underscoring its efficacy in advancing temporal reasoning in LLMs. The code can be found at this https URL
- [324] arXiv:2311.09548 (replaced) [pdf, ps, html, other]
-
Title: Universally Optimal Information Dissemination and Shortest Paths in the HYBRID Distributed ModelComments: This work combines the three preprints arXiv:2304.06317, arXiv:2304.07107, and arXiv:2306.05977 with some improved resultsSubjects: Distributed, Parallel, and Cluster Computing (cs.DC); Data Structures and Algorithms (cs.DS)
In this work we consider the HYBRID model of distributed computing, introduced recently by Augustine, Hinnenthal, Kuhn, Scheideler, and Schneider (SODA 2020), where nodes have access to two different communication modes: high-bandwidth local communication along the edges of the graph and low-bandwidth all-to-all communication, capturing the non-uniform nature of modern communication networks.
Prior work in HYBRID has focused on showing existentially optimal algorithms, meaning there exists a pathological family of instances on which no algorithm can do better. This neglects the fact that such worst-case instances often do not appear or can be actively avoided in practice. In this work, we focus on the notion of universal optimality, first raised by Garay, Kutten, and Peleg (FOCS 1993). Roughly speaking, a universally optimal algorithm is one that, given any input graph, runs as fast as the best algorithm designed specifically for that graph.
We show the first universally optimal algorithms in HYBRID. We present universally optimal solutions for fundamental information dissemination tasks, such as broadcasting and unicasting multiple messages in HYBRID. Furthermore, we apply these tools to obtain universally optimal solutions for various shortest paths problems in HYBRID.
A main conceptual contribution of this work is the conception of a new graph parameter called neighborhood quality that captures the inherent complexity of many fundamental graph problems in HYBRID.
We also show new existentially optimal shortest paths algorithms in HYBRID, which are utilized as key subroutines in our universally optimal algorithms and are of independent interest. Our new algorithms for $k$-source shortest paths match the existing $\tilde{\Omega}(\sqrt{k})$ lower bound for all $k$. Previously, the lower bound was only known to be tight when $k \in \tilde{\Omega}(n^{2/3})$. - [325] arXiv:2311.10437 (replaced) [pdf, ps, html, other]
-
Title: DSD-DA: Distillation-based Source Debiasing for Domain Adaptive Object DetectionComments: Accepted by ICML2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Though feature-alignment based Domain Adaptive Object Detection (DAOD) methods have achieved remarkable progress, they ignore the source bias issue, i.e., the detector tends to acquire more source-specific knowledge, impeding its generalization capabilities in the target domain. Furthermore, these methods face a more formidable challenge in achieving consistent classification and localization in the target domain compared to the source domain. To overcome these challenges, we propose a novel Distillation-based Source Debiasing (DSD) framework for DAOD, which can distill domain-agnostic knowledge from a pre-trained teacher model, improving the detector's performance on both domains. In addition, we design a Target-Relevant Object Localization Network (TROLN), which can mine target-related localization information from source and target-style mixed data. Accordingly, we present a Domain-aware Consistency Enhancing (DCE) strategy, in which these information are formulated into a new localization representation to further refine classification scores in the testing stage, achieving a harmonization between classification and localization. Extensive experiments have been conducted to manifest the effectiveness of this method, which consistently improves the strong baseline by large margins, outperforming existing alignment-based works.
- [326] arXiv:2311.13889 (replaced) [pdf, ps, html, other]
-
Title: SIMBa: System Identification Methods leveraging BackpropagationComments: First two authors contributed equally. Submitted to IEEE TCSTSubjects: Systems and Control (eess.SY)
This manuscript details and extends the SIMBa toolbox (System Identification Methods leveraging Backpropagation) presented in previous work, which uses well-established Machine Learning tools for discrete-time linear multi-step-ahead state-space System Identification (SI). SIMBa leverages linear-matrix-inequality-based free parametrizations of Schur matrices to guarantee the stability of the identified model by design. In this paper, backed up by novel free parametrizations of Schur matrices, we extend the toolbox to show how SIMBa can incorporate known sparsity patterns or true values of the state-space matrices to identify without jeopardizing stability.
We extensively investigate SIMBa's behavior when identifying diverse systems with various properties from both simulated and real-world data. Overall, we find it consistently outperforms traditional stable subspace identification methods, and sometimes significantly, especially when enforcing desired model properties. These results hint at the potential of SIMBa to pave the way for generic structured nonlinear SI. The toolbox is open-sourced on this https URL. - [327] arXiv:2311.14363 (replaced) [pdf, ps, html, other]
-
Title: High order unfitted finite element discretizations for explicit boundary representationsSubjects: Computational Engineering, Finance, and Science (cs.CE); Numerical Analysis (math.NA)
When modeling scientific and industrial problems, geometries are typically modeled by explicit boundary representations obtained from computer-aided design software. Unfitted (also known as embedded or immersed) finite element methods offer a significant advantage in dealing with complex geometries, eliminating the need for generating unstructured body-fitted meshes. However, current unfitted finite elements on nonlinear geometries are restricted to implicit (possibly high-order) level set geometries. In this work, we introduce a novel automatic computational pipeline to approximate solutions of partial differential equations on domains defined by explicit nonlinear boundary representations. For the geometrical discretization, we propose a novel algorithm to generate quadratures for the bulk and surface integration on nonlinear polytopes required to compute all the terms in unfitted finite element methods. The algorithm relies on a nonlinear triangulation of the boundary, a kd-tree refinement of the surface cells that simplify the nonlinear intersections of surface and background cells to simple cases that are diffeomorphically equivalent to linear intersections, robust polynomial root-finding algorithms and surface parameterization techniques. We prove the correctness of the proposed algorithm. We have successfully applied this algorithm to simulate partial differential equations with unfitted finite elements on nonlinear domains described by computer-aided design models, demonstrating the robustness of the geometric algorithm and showing high-order accuracy of the overall method.
- [328] arXiv:2311.14495 (replaced) [pdf, ps, html, other]
-
Title: StableSSM: Alleviating the Curse of Memory in State-space Models through Stable ReparameterizationComments: 27 pages, 7 figures, ICML 2024Subjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Dynamical Systems (math.DS)
In this paper, we investigate the long-term memory learning capabilities of state-space models (SSMs) from the perspective of parameterization. We prove that state-space models without any reparameterization exhibit a memory limitation similar to that of traditional RNNs: the target relationships that can be stably approximated by state-space models must have an exponential decaying memory. Our analysis identifies this "curse of memory" as a result of the recurrent weights converging to a stability boundary, suggesting that a reparameterization technique can be effective. To this end, we introduce a class of reparameterization techniques for SSMs that effectively lift its memory limitations. Besides improving approximation capabilities, we further illustrate that a principled choice of reparameterization scheme can also enhance optimization stability. We validate our findings using synthetic datasets, language models and image classifications.
- [329] arXiv:2311.16097 (replaced) [pdf, ps, html, other]
-
Title: CG-HOI: Contact-Guided 3D Human-Object Interaction GenerationSubjects: Computer Vision and Pattern Recognition (cs.CV)
We propose CG-HOI, the first method to address the task of generating dynamic 3D human-object interactions (HOIs) from text. We model the motion of both human and object in an interdependent fashion, as semantically rich human motion rarely happens in isolation without any interactions. Our key insight is that explicitly modeling contact between the human body surface and object geometry can be used as strong proxy guidance, both during training and inference. Using this guidance to bridge human and object motion enables generating more realistic and physically plausible interaction sequences, where the human body and corresponding object move in a coherent manner. Our method first learns to model human motion, object motion, and contact in a joint diffusion process, inter-correlated through cross-attention. We then leverage this learned contact for guidance during inference to synthesize realistic and coherent HOIs. Extensive evaluation shows that our joint contact-based human-object interaction approach generates realistic and physically plausible sequences, and we show two applications highlighting the capabilities of our method. Conditioned on a given object trajectory, we can generate the corresponding human motion without re-training, demonstrating strong human-object interdependency learning. Our approach is also flexible, and can be applied to static real-world 3D scene scans.
- [330] arXiv:2312.03579 (replaced) [pdf, ps, html, other]
-
Title: The Implication Problem for Functional Dependencies and Variants of Marginal Distribution EquivalencesComments: 23 pages, improved version after reviewer comments, results unchangedSubjects: Logic in Computer Science (cs.LO)
We study functional dependencies together with two different probabilistic dependency notions: unary marginal identity and unary marginal distribution equivalence. A unary marginal identity states that two variables x and y are identically distributed. A unary marginal distribution equivalence states that the multiset consisting of the marginal probabilities of all the values for variable x is the same as the corresponding multiset for y. We present a sound and complete axiomatization for the class of these dependencies and show that it has Armstrong relations. The axiomatization is infinite, but we show that there can be no finite axiomatization. The implication problem for the subclass that contains only functional dependencies and unary marginal identities can be simulated with functional dependencies and unary inclusion atoms, and therefore the problem is in polynomial-time. This complexity bound also holds in the case of the full class, which we show by constructing a polynomial-time algorithm.
- [331] arXiv:2312.06424 (replaced) [pdf, ps, html, other]
-
Title: Cross Domain LifeLong Sequential Modeling for Online Click-Through Rate PredictionComments: Accepted by KDD 2024Subjects: Information Retrieval (cs.IR)
Deep neural networks (DNNs) that incorporated lifelong sequential modeling (LSM) have brought great success to recommendation systems in various social media platforms. While continuous improvements have been made in domain-specific LSM, limited work has been done in cross-domain LSM, which considers modeling of lifelong sequences of both target domain and source domain. In this paper, we propose Lifelong Cross Network (LCN) to incorporate cross-domain LSM to improve the click-through rate (CTR) prediction in the target domain. The proposed LCN contains a LifeLong Attention Pyramid (LAP) module that comprises of three levels of cascaded attentions to effectively extract interest representations with respect to the candidate item from lifelong sequences. We also propose Cross Representation Production (CRP) module to enforce additional supervision on the learning and alignment of cross-domain representations so that they can be better reused on learning of the CTR prediction in the target domain. We conducted extensive experiments on WeChat Channels industrial dataset as well as on benchmark dataset. Results have revealed that the proposed LCN outperforms existing work in terms of both prediction accuracy and online performance.
- [332] arXiv:2312.06941 (replaced) [pdf, ps, other]
-
Title: Humans vs Large Language Models: Judgmental Forecasting in an Era of Advanced AISubjects: Machine Learning (cs.LG); Computers and Society (cs.CY)
This study investigates the forecasting accuracy of human experts versus Large Language Models (LLMs) in the retail sector, particularly during standard and promotional sales periods. Utilizing a controlled experimental setup with 123 human forecasters and five LLMs, including ChatGPT4, ChatGPT3.5, Bard, Bing, and Llama2, we evaluated forecasting precision through Mean Absolute Percentage Error. Our analysis centered on the effect of the following factors on forecasters performance: the supporting statistical model (baseline and advanced), whether the product was on promotion, and the nature of external impact. The findings indicate that LLMs do not consistently outperform humans in forecasting accuracy and that advanced statistical forecasting models do not uniformly enhance the performance of either human forecasters or LLMs. Both human and LLM forecasters exhibited increased forecasting errors, particularly during promotional periods and under the influence of positive external impacts. Our findings call for careful consideration when integrating LLMs into practical forecasting processes.
- [333] arXiv:2312.07533 (replaced) [pdf, ps, html, other]
-
Title: VILA: On Pre-training for Visual Language ModelsJi Lin, Hongxu Yin, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song HanComments: CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Visual language models (VLMs) rapidly progressed with the recent success of large language models. There have been growing efforts on visual instruction tuning to extend the LLM with visual inputs, but lacks an in-depth study of the visual language pre-training process, where the model learns to perform joint modeling on both modalities. In this work, we examine the design options for VLM pre-training by augmenting LLM towards VLM through step-by-step controllable comparisons. We introduce three main findings: (1) freezing LLMs during pre-training can achieve decent zero-shot performance, but lack in-context learning capability, which requires unfreezing the LLM; (2) interleaved pre-training data is beneficial whereas image-text pairs alone are not optimal; (3) re-blending text-only instruction data to image-text data during instruction fine-tuning not only remedies the degradation of text-only tasks, but also boosts VLM task accuracy. With an enhanced pre-training recipe we build VILA, a Visual Language model family that consistently outperforms the state-of-the-art models, e.g., LLaVA-1.5, across main benchmarks without bells and whistles. Multi-modal pre-training also helps unveil appealing properties of VILA, including multi-image reasoning, enhanced in-context learning, and better world knowledge.
- [334] arXiv:2312.08140 (replaced) [pdf, ps, html, other]
-
Title: GVE-LPA: Fast Label Propagation Algorithm (LPA) for Community Detection in Shared Memory SettingComments: 8 pages, 7 figures, 1 table. arXiv admin note: text overlap with arXiv:2312.04876Subjects: Distributed, Parallel, and Cluster Computing (cs.DC); Performance (cs.PF)
Community detection is the problem of identifying natural divisions in networks. Efficient parallel algorithms for this purpose are crucial in various applications, particularly as datasets grow to substantial scales. This technical report presents an optimized parallel implementation of the Label Propagation Algorithm (LPA), a high speed community detection method, for shared memory multicore systems. On a server equipped with dual 16-core Intel Xeon Gold 6226R processors, our LPA, which we term as GVE-LPA, outperforms FLPA, igraph LPA, and NetworKit LPA by 139x, 97,000x, and 40x respectively - achieving a processing rate of 1.4B edges/s on a 3.8B edge graph. In addition, GVE-LPA scales at a rate of 1.7x every doubling of threads.
- [335] arXiv:2312.12321 (replaced) [pdf, ps, html, other]
-
Title: Bypassing the Safety Training of Open-Source LLMs with Priming AttacksComments: ICLR Tiny Paper camera ready versionSubjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
With the recent surge in popularity of LLMs has come an ever-increasing need for LLM safety training. In this paper, we investigate the fragility of SOTA open-source LLMs under simple, optimization-free attacks we refer to as $\textit{priming attacks}$, which are easy to execute and effectively bypass alignment from safety training. Our proposed attack improves the Attack Success Rate on Harmful Behaviors, as measured by Llama Guard, by up to $3.3\times$ compared to baselines. Source code and data are available at this https URL.
- [336] arXiv:2312.15682 (replaced) [pdf, ps, html, other]
-
Title: A Grating Based High-Frequency Motion Stimulus Paradigm for Steady-State Motion Visual Evoked PotentialsSubjects: Human-Computer Interaction (cs.HC)
Objective: This paper proposes a novel type of stimulus in the shape of sinusoidal gratings displayed with an imperceptibly high-frequency motion. The stimulus has been designed for use in BCI (Brain Computer Interface) applications that employ visually evoked potentials (VEPs) in an effort to mitigate discomfort associated with VEPs. The stimuli set included traditional VEP stimuli, already established in the literature, allowing comparative analyses. We conducted analyses of signal distinction measures by calculating the signal-to-noise ratio and the classification performance of its evoked potentials. Methods: Fourteen participants were seated in a dimly lit room facing a display. Participants' fixation on the central stimulus was controlled by means of a desktop eye tracker. Participants attended a flicker-based steady-state VEP (SSVEP) task, a motion-based steady-state-motion VEP (SSMVEP) task, and the novel stimulus task (the imperceptible grating SSMVEP). Participants were asked to complete behavioral fatigue scale tasks. Results: A significant effect of stimulus type was observed, accompanied by insignificant differences in prediction accuracy. Partially significant task effects were obtained in fatigue scale tasks. Conclusion: The study revealed that the imperceptible grating SSMVEP stimulus successfully evoked SSMVEP responses within acceptable margins in the related cortical regions. This novel stimulus contributes to BCI research by providing an imperceptible interface, improving already established stimuli design in the SSVEP and the SSMVEP literature. Significance: The present paper provides a novel SSMVEP stimulus type that may inform the future design of effective VEP-based BCI paradigms that allow seamless interaction with computer interfaces.
- [337] arXiv:2401.00396 (replaced) [pdf, ps, html, other]
-
Title: RAGTruth: A Hallucination Corpus for Developing Trustworthy Retrieval-Augmented Language ModelsSubjects: Computation and Language (cs.CL)
Retrieval-augmented generation (RAG) has become a main technique for alleviating hallucinations in large language models (LLMs). Despite the integration of RAG, LLMs may still present unsupported or contradictory claims to the retrieved contents. In order to develop effective hallucination prevention strategies under RAG, it is important to create benchmark datasets that can measure the extent of hallucination. This paper presents RAGTruth, a corpus tailored for analyzing word-level hallucinations in various domains and tasks within the standard RAG frameworks for LLM applications. RAGTruth comprises nearly 18,000 naturally generated responses from diverse LLMs using RAG. These responses have undergone meticulous manual annotations at both the individual cases and word levels, incorporating evaluations of hallucination intensity. We not only benchmark hallucination frequencies across different LLMs, but also critically assess the effectiveness of several existing hallucination detection methodologies. Furthermore, we show that using a high-quality dataset such as RAGTruth, it is possible to finetune a relatively small LLM and achieve a competitive level of performance in hallucination detection when compared to the existing prompt-based approaches using state-of-the-art large language models such as GPT-4.
- [338] arXiv:2401.04971 (replaced) [pdf, ps, html, other]
-
Title: A Survey on Cross-Domain Sequential RecommendationComments: Accepted to the IJCAI 2024 Survey TrackSubjects: Information Retrieval (cs.IR)
Cross-domain sequential recommendation (CDSR) shifts the modeling of user preferences from flat to stereoscopic by integrating and learning interaction information from multiple domains at different granularities (ranging from inter-sequence to intra-sequence and from single-domain to cross-domain). In this survey, we first define the CDSR problem using a four-dimensional tensor and then analyze its multi-type input representations under multidirectional dimensionality reductions. Following that, we provide a systematic overview from both macro and micro views. From a macro view, we abstract the multi-level fusion structures of various models across domains and discuss their bridges for fusion. From a micro view, focusing on the existing models, we first discuss the basic technologies and then explain the auxiliary learning technologies. Finally, we exhibit the available public datasets and the representative experimental results as well as provide some insights into future directions for research in CDSR.
- [339] arXiv:2401.07494 (replaced) [pdf, ps, html, other]
-
Title: Input Convex Lipschitz RNN: A Fast and Robust Approach for Engineering TasksSubjects: Machine Learning (cs.LG); Computational Engineering, Finance, and Science (cs.CE); Systems and Control (eess.SY)
Computational efficiency and non-adversarial robustness are critical factors in process modeling and optimization for real-world engineering applications. Yet, conventional neural networks often fall short in addressing both simultaneously, or even separately. Drawing insights from natural physical systems and existing literature, it is known theoretically that an input convex architecture will enhance computational efficiency, while a Lipschitz-constrained architecture will bolster non-adversarial robustness. However, integrating both properties into one model is a nontrivial task, as enforcing one property may compromise the other one. Therefore, in this work, we develop a novel network architecture, termed Input Convex Lipschitz Recurrent Neural Networks, that inherits the strengths of both convexity and Lipschitz continuity. This model is explicitly designed for fast and robust optimization-based tasks, which outperforms existing recurrent units in terms of computational efficiency and non-adversarial robustness. Additionally, we have successfully implemented this model in various practical engineering applications, such as optimization of chemical processes and real-world solar irradiance prediction for Solar PV system planning at LHT Holdings in Singapore. Source code is available at this https URL.
- [340] arXiv:2401.07780 (replaced) [pdf, ps, other]
-
Title: Learning Soft Constrained MPC Value Functions: Efficient MPC Design and Implementation providing Stability and Safety GuaranteesSubjects: Systems and Control (eess.SY); Optimization and Control (math.OC)
Model Predictive Control (MPC) can be applied to safety-critical control problems, providing closed-loop safety and performance guarantees. Implementation of MPC controllers requires solving an optimization problem at every sampling instant, which is challenging to execute on embedded hardware. To address this challenge, we propose a framework that combines a tightened soft constrained MPC formulation with supervised learning to approximate the MPC value function. This combination enables us to obtain a corresponding optimal control law, which can be implemented efficiently on embedded platforms. The framework ensures stability and constraint satisfaction for various nonlinear systems. While the design effort is similar to that of nominal MPC, the proposed formulation provides input-to-state stability (ISS) with respect to the approximation error of the value function. Furthermore, we prove that the value function corresponding to the soft constrained MPC problem is Lipschitz continuous for Lipschitz continuous systems, even if the optimal control law may be discontinuous. This serves two purposes: First, it allows to relate approximation errors to a sufficiently large constraint tightening to obtain constraint satisfaction guarantees. Second, it paves the way for an efficient supervised learning procedure to obtain a continuous value function approximation. We demonstrate the effectiveness of the method using a nonlinear numerical example.
- [341] arXiv:2401.07927 (replaced) [pdf, ps, other]
-
Title: Are self-explanations from Large Language Models faithful?Comments: The 62nd Annual Meeting of the Association for Computational LinguisticsSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Instruction-tuned Large Language Models (LLMs) excel at many tasks and will even explain their reasoning, so-called self-explanations. However, convincing and wrong self-explanations can lead to unsupported confidence in LLMs, thus increasing risk. Therefore, it's important to measure if self-explanations truly reflect the model's behavior. Such a measure is called interpretability-faithfulness and is challenging to perform since the ground truth is inaccessible, and many LLMs only have an inference API. To address this, we propose employing self-consistency checks to measure faithfulness. For example, if an LLM says a set of words is important for making a prediction, then it should not be able to make its prediction without these words. While self-consistency checks are a common approach to faithfulness, they have not previously been successfully applied to LLM self-explanations for counterfactual, feature attribution, and redaction explanations. Our results demonstrate that faithfulness is explanation, model, and task-dependent, showing self-explanations should not be trusted in general. For example, with sentiment classification, counterfactuals are more faithful for Llama2, feature attribution for Mistral, and redaction for Falcon 40B.
- [342] arXiv:2401.08298 (replaced) [pdf, ps, html, other]
-
Title: Online Elasticity Estimation and Material Sorting Using Standard Robot GrippersComments: 22 pages, 17 figuresJournal-ref: The International Journal of Advanced Manufacturing Technology (2024)Subjects: Robotics (cs.RO)
We experimentally evaluated the accuracy with which material properties can be estimated through object compression by two standard parallel jaw grippers and a force/torque sensor mounted at the robot wrist, with a professional biaxial compression device used as reference. Gripper effort versus position curves were obtained and transformed into stress/strain curves. The modulus of elasticity was estimated at different strain points and the effect of multiple compression cycles (precycling), compression speed, and the gripper surface area on estimation was studied. Viscoelasticity was estimated using the energy absorbed in a compression/decompression cycle, the Kelvin-Voigt, and Hunt-Crossley models. We found that: (1) slower compression speeds improved elasticity estimation, while precycling or surface area did not; (2) the robot grippers, even after calibration, were found to have a limited capability of delivering accurate estimates of absolute values of Young's modulus and viscoelasticity; (3) relative ordering of material characteristics was largely consistent across different grippers; (4) despite the nonlinear characteristics of deformable objects, fitting linear stress/strain approximations led to more stable results than local estimates of Young's modulus; (5) the Hunt-Crossley model worked best to estimate viscoelasticity, from a single object compression. A two-dimensional space formed by elasticity and viscoelasticity estimates obtained from a single grasp is advantageous for the discrimination of the object material properties. We demonstrated the applicability of our findings in a mock single stream recycling scenario, where plastic, paper, and metal objects were correctly separated from a single grasp, even when compressed at different locations on the object. The data and code are publicly available.
- [343] arXiv:2401.09245 (replaced) [pdf, ps, html, other]
-
Title: Uncertainty estimates for semantic segmentation: providing enhanced reliability for automated motor claims handlingJan Küchler (1), Daniel Kröll (1), Sebastian Schoenen (1), Andreas Witte (1) ((1) ControlExpert GmbH, Langenfeld, Germany)Comments: 11 pages, 10 figures, 3 tablesJournal-ref: Machine Vision and Applications 35, 66 (2024)Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep neural network models for image segmentation can be a powerful tool for the automation of motor claims handling processes in the insurance industry. A crucial aspect is the reliability of the model outputs when facing adverse conditions, such as low quality photos taken by claimants to document damages. We explore the use of a meta-classification model to empirically assess the precision of segments predicted by a model trained for the semantic segmentation of car body parts. Different sets of features correlated with the quality of a segment are compared, and an AUROC score of 0.915 is achieved for distinguishing between high- and low-quality segments. By removing low-quality segments, the average mIoU of the segmentation output is improved by 16 percentage points and the number of wrongly predicted segments is reduced by 77%.
- [344] arXiv:2401.09750 (replaced) [pdf, ps, html, other]
-
Title: Exploration and Anti-Exploration with Distributional Random Network DistillationComments: ICML 2024 acceptedSubjects: Machine Learning (cs.LG)
Exploration remains a critical issue in deep reinforcement learning for an agent to attain high returns in unknown environments. Although the prevailing exploration Random Network Distillation (RND) algorithm has been demonstrated to be effective in numerous environments, it often needs more discriminative power in bonus allocation. This paper highlights the "bonus inconsistency" issue within RND, pinpointing its primary limitation. To address this issue, we introduce the Distributional RND (DRND), a derivative of the RND. DRND enhances the exploration process by distilling a distribution of random networks and implicitly incorporating pseudo counts to improve the precision of bonus allocation. This refinement encourages agents to engage in more extensive exploration. Our method effectively mitigates the inconsistency issue without introducing significant computational overhead. Both theoretical analysis and experimental results demonstrate the superiority of our approach over the original RND algorithm. Our method excels in challenging online exploration scenarios and effectively serves as an anti-exploration mechanism in D4RL offline tasks. Our code is publicly available at this https URL.
- [345] arXiv:2401.14192 (replaced) [pdf, ps, html, other]
-
Title: How Can Large Language Models Understand Spatial-Temporal Data?Subjects: Machine Learning (cs.LG); Computation and Language (cs.CL)
While Large Language Models (LLMs) dominate tasks like natural language processing and computer vision, harnessing their power for spatial-temporal forecasting remains challenging. The disparity between sequential text and complex spatial-temporal data hinders this application. To address this issue, this paper introduces STG-LLM, an innovative approach empowering LLMs for spatial-temporal forecasting. We tackle the data mismatch by proposing: 1) STG-Tokenizer: This spatial-temporal graph tokenizer transforms intricate graph data into concise tokens capturing both spatial and temporal relationships; 2) STG-Adapter: This minimalistic adapter, consisting of linear encoding and decoding layers, bridges the gap between tokenized data and LLM comprehension. By fine-tuning only a small set of parameters, it can effectively grasp the semantics of tokens generated by STG-Tokenizer, while preserving the original natural language understanding capabilities of LLMs. Extensive experiments on diverse spatial-temporal benchmark datasets show that STG-LLM successfully unlocks LLM potential for spatial-temporal forecasting. Remarkably, our approach achieves competitive performance on par with dedicated SOTA methods.
- [346] arXiv:2401.14341 (replaced) [pdf, ps, html, other]
-
Title: Efficient Construction of Long Orientable SequencesSubjects: Data Structures and Algorithms (cs.DS); Discrete Mathematics (cs.DM); Information Theory (cs.IT); Combinatorics (math.CO)
An orientable sequence of order $n$ is a cyclic binary sequence such that each length-$n$ substring appears at most once \emph{in either direction}. Maximal length orientable sequences are known only for $n\leq 7$, and a trivial upper bound on their length is $2^{n-1} - 2^{\lfloor(n-1)/2\rfloor}$. This paper presents the first efficient algorithm to construct orientable sequences with asymptotically optimal length; more specifically, our algorithm constructs orientable sequences via cycle-joining and a successor-rule approach requiring $O(n)$ time per symbol and $O(n)$ space. This answers a longstanding open question from Dai, Martin, Robshaw, Wild [Cryptography and Coding III (1993)]. Our sequences are applied to find new longest-known orientable sequences for $n\leq 20$.
- [347] arXiv:2401.14857 (replaced) [pdf, ps, html, other]
-
Title: LIV-GaussMap: LiDAR-Inertial-Visual Fusion for Real-time 3D Radiance Field Map RenderingSubjects: Robotics (cs.RO)
We introduce an integrated precise LiDAR, Inertial, and Visual (LIV) multimodal sensor fused mapping system that builds on the differentiable \pre{surface splatting }\now{Gaussians} to improve the mapping fidelity, quality, and structural accuracy. Notably, this is also a novel form of tightly coupled map for LiDAR-visual-inertial sensor fusion.
This system leverages the complementary characteristics of LiDAR and visual data to capture the geometric structures of large-scale 3D scenes and restore their visual surface information with high fidelity. The initialization for the scene's surface Gaussians and the sensor's poses of each frame are obtained using a LiDAR-inertial system with the feature of size-adaptive voxels. Then, we optimized and refined the Gaussians using visual-derived photometric gradients to optimize their quality and density.
Our method is compatible with various types of LiDAR, including solid-state and mechanical LiDAR, supporting both repetitive and non-repetitive scanning modes. Bolstering structure construction through LiDAR and facilitating real-time generation of photorealistic renderings across diverse LIV datasets. It showcases notable resilience and versatility in generating real-time photorealistic scenes potentially for digital twins and virtual reality, while also holding potential applicability in real-time SLAM and robotics domains.
We release our software and hardware and self-collected datasets to benefit the community. - [348] arXiv:2401.16640 (replaced) [pdf, ps, html, other]
-
Title: TeenyTinyLlama: open-source tiny language models trained in Brazilian PortugueseComments: 21 pages, 5 figuresJournal-ref: Machine Learning With Applications, 16, 100558Subjects: Computation and Language (cs.CL); Machine Learning (cs.LG)
Large language models (LLMs) have significantly advanced natural language processing, but their progress has yet to be equal across languages. While most LLMs are trained in high-resource languages like English, multilingual models generally underperform monolingual ones. Additionally, aspects of their multilingual foundation sometimes restrict the byproducts they produce, like computational demands and licensing regimes. In this study, we document the development of open-foundation models tailored for use in low-resource settings, their limitations, and their benefits. This is the TeenyTinyLlama pair: two compact models for Brazilian Portuguese text generation. We release them under the permissive Apache 2.0 license on GitHub and Hugging Face for community use and further development. See this https URL
- [349] arXiv:2401.16688 (replaced) [pdf, ps, html, other]
-
Title: Characterization of Magnetic Labyrinthine Structures through Junctions and Terminals Detection Using Template Matching and CNNComments: 12 pages, 7 figures, submitted to IEEE AccessSubjects: Computer Vision and Pattern Recognition (cs.CV)
Defects influence diverse properties of materials, shaping their structural, mechanical, and electronic characteristics. Among a variety of materials exhibiting unique defects, magnets exhibit diverse nano- to micro-scale defects and have been intensively studied in materials science. Specifically, defects in magnetic labyrinthine patterns, called junctions and terminals, serve as the canonical targets of the research. While detecting and characterizing such defects is crucial for understanding magnets, systematically investigating large-scale images containing over a thousand closely packed junctions and terminals remains a formidable challenge. This study introduces a new technique called TM-CNN (Template Matching - Convolutional Neural Network) designed to detect a multitude of small objects in images, such as the defects in magnetic labyrinthine patterns. TM-CNN was used to identify 641,649 such structures in 444 experimental images, and the results were explored to deepen understanding of magnetic materials. It employs a two-stage detection approach combining template matching, used in initial detection, with a convolutional neural network, used to eliminate incorrect identifications. To train a CNN classifier, it is necessary to annotate a large number of training images.This difficulty prevents the use of CNN in many practical applications. TM-CNN significantly reduces the manual workload for creating training images by automatically making most of the annotations and leaving only a small number of corrections to human reviewers. In testing, TM-CNN achieved an impressive F1 score of 0.991, far outperforming traditional template matching and CNN-based object detection algorithms.
- [350] arXiv:2401.17536 (replaced) [pdf, ps, html, other]
-
Title: PipeNet: Question Answering with Semantic Pruning over Knowledge GraphsComments: 8 pages, 4 figures, accepted to *SEM 2024Subjects: Computation and Language (cs.CL)
It is well acknowledged that incorporating explicit knowledge graphs (KGs) can benefit question answering. Existing approaches typically follow a grounding-reasoning pipeline in which entity nodes are first grounded for the query (question and candidate answers), and then a reasoning module reasons over the matched multi-hop subgraph for answer prediction. Although the pipeline largely alleviates the issue of extracting essential information from giant KGs, efficiency is still an open challenge when scaling up hops in grounding the subgraphs. In this paper, we target at finding semantically related entity nodes in the subgraph to improve the efficiency of graph reasoning with KG. We propose a grounding-pruning-reasoning pipeline to prune noisy nodes, remarkably reducing the computation cost and memory usage while also obtaining decent subgraph representation. In detail, the pruning module first scores concept nodes based on the dependency distance between matched spans and then prunes the nodes according to score ranks. To facilitate the evaluation of pruned subgraphs, we also propose a graph attention network (GAT) based module to reason with the subgraph data. Experimental results on CommonsenseQA and OpenBookQA demonstrate the effectiveness of our method.
- [351] arXiv:2401.17583 (replaced) [pdf, ps, html, other]
-
Title: Agile But Safe: Learning Collision-Free High-Speed Legged LocomotionComments: Published at RSS 2024, Project website: this https URLSubjects: Robotics (cs.RO); Artificial Intelligence (cs.AI); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Systems and Control (eess.SY)
Legged robots navigating cluttered environments must be jointly agile for efficient task execution and safe to avoid collisions with obstacles or humans. Existing studies either develop conservative controllers (< 1.0 m/s) to ensure safety, or focus on agility without considering potentially fatal collisions. This paper introduces Agile But Safe (ABS), a learning-based control framework that enables agile and collision-free locomotion for quadrupedal robots. ABS involves an agile policy to execute agile motor skills amidst obstacles and a recovery policy to prevent failures, collaboratively achieving high-speed and collision-free navigation. The policy switch in ABS is governed by a learned control-theoretic reach-avoid value network, which also guides the recovery policy as an objective function, thereby safeguarding the robot in a closed loop. The training process involves the learning of the agile policy, the reach-avoid value network, the recovery policy, and an exteroception representation network, all in simulation. These trained modules can be directly deployed in the real world with onboard sensing and computation, leading to high-speed and collision-free navigation in confined indoor and outdoor spaces with both static and dynamic obstacles.
- [352] arXiv:2402.00627 (replaced) [pdf, ps, html, other]
-
Title: CapHuman: Capture Your Moments in Parallel UniversesComments: Accepted by CVPR 2024. Project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
We concentrate on a novel human-centric image synthesis task, that is, given only one reference facial photograph, it is expected to generate specific individual images with diverse head positions, poses, facial expressions, and illuminations in different contexts. To accomplish this goal, we argue that our generative model should be capable of the following favorable characteristics: (1) a strong visual and semantic understanding of our world and human society for basic object and human image generation. (2) generalizable identity preservation ability. (3) flexible and fine-grained head control. Recently, large pre-trained text-to-image diffusion models have shown remarkable results, serving as a powerful generative foundation. As a basis, we aim to unleash the above two capabilities of the pre-trained model. In this work, we present a new framework named CapHuman. We embrace the "encode then learn to align" paradigm, which enables generalizable identity preservation for new individuals without cumbersome tuning at inference. CapHuman encodes identity features and then learns to align them into the latent space. Moreover, we introduce the 3D facial prior to equip our model with control over the human head in a flexible and 3D-consistent manner. Extensive qualitative and quantitative analyses demonstrate our CapHuman can produce well-identity-preserved, photo-realistic, and high-fidelity portraits with content-rich representations and various head renditions, superior to established baselines. Code and checkpoint will be released at this https URL.
- [353] arXiv:2402.03145 (replaced) [pdf, ps, other]
-
Title: SafEDMD: A certified learning architecture tailored to data-driven control of nonlinear dynamical systemsSubjects: Systems and Control (eess.SY); Machine Learning (cs.LG); Optimization and Control (math.OC)
The Koopman operator serves as the theoretical backbone for machine learning of dynamical control systems, where the operator is heuristically approximated by extended dynamic mode decomposition (EDMD). In this paper, we propose Stability- and certificate-oriented EDMD (SafEDMD): a novel EDMD-based learning architecture which comes along with rigorous certificates, resulting in a reliable surrogate model generated in a data-driven fashion. To ensure the trustworthiness of SafEDMD, we derive proportional error bounds, which vanish at the origin and are tailored to control tasks, leading to certified controller design based on semi-definite programming. We illustrate the developed method by means of several benchmark examples and highlight the advantages over state-of-the-art methods.
- [354] arXiv:2402.03881 (replaced) [pdf, ps, html, other]
-
Title: DEthna: Accurate Ethereum Network Topology Discovery with Marked TransactionsComments: Accepted for publication in IEEE INFOCOM 2024Subjects: Networking and Internet Architecture (cs.NI)
In Ethereum, the ledger exchanges messages along an underlying Peer-to-Peer (P2P) network to reach consistency. Understanding the underlying network topology of Ethereum is crucial for network optimization, security and scalability. However, the accurate discovery of Ethereum network topology is non-trivial due to its deliberately designed security mechanism. Consequently, existing measuring schemes cannot accurately infer the Ethereum network topology with a low cost. To address this challenge, we propose the Distributed Ethereum Network Analyzer (DEthna) tool, which can accurately and efficiently measure the Ethereum network topology. In DEthna, a novel parallel measurement model is proposed that can generate marked transactions to infer link connections based on the transaction replacement and propagation mechanism in Ethereum. Moreover, a workload offloading scheme is designed so that DEthna can be deployed on multiple distributed probing nodes so as to measure a large-scale Ethereum network at a low cost. We run DEthna on Goerli (the most popular Ethereum test network) to evaluate its capability in discovering network topology. The experimental results demonstrate that DEthna significantly outperforms the state-of-the-art baselines. Based on DEthna, we further analyze characteristics of the Ethereum network revealing that there exist more than 50% low-degree Ethereum nodes that weaken the network robustness.
- [355] arXiv:2402.05008 (replaced) [pdf, ps, html, other]
-
Title: EfficientViT-SAM: Accelerated Segment Anything Model Without Accuracy LossComments: CVPR 2024 Workshop (Efficient Large Vision Models)Subjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
We present EfficientViT-SAM, a new family of accelerated segment anything models. We retain SAM's lightweight prompt encoder and mask decoder while replacing the heavy image encoder with EfficientViT. For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT. Subsequently, we conduct end-to-end training on the SA-1B dataset. Benefiting from EfficientViT's efficiency and capacity, EfficientViT-SAM delivers 48.9x measured TensorRT speedup on A100 GPU over SAM-ViT-H without sacrificing performance. Our code and pre-trained models are released at this https URL.
- [356] arXiv:2402.07166 (replaced) [pdf, ps, html, other]
-
Title: Social Evolution of Published Text and The Emergence of Artificial Intelligence Through Large Language Models and The Problem of Toxicity and BiasSubjects: Artificial Intelligence (cs.AI)
We provide a birds eye view of the rapid developments in AI and Deep Learning that has led to the path-breaking emergence of AI in Large Language Models. The aim of this study is to place all these developments in a pragmatic broader historical social perspective without any exaggerations while at the same time without any pessimism that created the AI winter in the 1970s to 1990s. We also at the same time point out toxicity, bias, memorization, sycophancy, logical inconsistencies, hallucinations that exist just as a warning to the overly optimistic. We note here that just as this emergence of AI seems to occur at a threshold point in the number of neural connections or weights, it has also been observed that human brain and especially the cortex region is nothing special or extraordinary but simply a case of scaled-up version of the primate brain and that even the human intelligence seems like an emergent phenomena of scale.
- [357] arXiv:2402.08249 (replaced) [pdf, ps, html, other]
-
Title: SepRep-Net: Multi-source Free Domain Adaptation via Model Separation And ReparameterizationSubjects: Computer Vision and Pattern Recognition (cs.CV)
We consider multi-source free domain adaptation, the problem of adapting multiple existing models to a new domain without accessing the source data. Among existing approaches, methods based on model ensemble are effective in both the source and target domains, but incur significantly increased computational costs. Towards this dilemma, in this work, we propose a novel framework called SepRep-Net, which tackles multi-source free domain adaptation via model Separation and Reparameterization.Concretely, SepRep-Net reassembled multiple existing models to a unified network, while maintaining separate pathways (Separation). During training, separate pathways are optimized in parallel with the information exchange regularly performed via an additional feature merging unit. With our specific design, these pathways can be further reparameterized into a single one to facilitate inference (Reparameterization). SepRep-Net is characterized by 1) effectiveness: competitive performance on the target domain, 2) efficiency: low computational costs, and 3) generalizability: maintaining more source knowledge than existing solutions. As a general approach, SepRep-Net can be seamlessly plugged into various methods. Extensive experiments validate the performance of SepRep-Net on mainstream benchmarks.
- [358] arXiv:2402.08502 (replaced) [pdf, ps, other]
-
Title: Provable Traffic Rule Compliance in Safe Reinforcement Learning on the Open SeaSubjects: Machine Learning (cs.LG); Systems and Control (eess.SY)
For safe operation, autonomous vehicles have to obey traffic rules that are set forth in legal documents formulated in natural language. Temporal logic is a suitable concept to formalize such traffic rules. Still, temporal logic rules often result in constraints that are hard to solve using optimization-based motion planners. Reinforcement learning (RL) is a promising method to find motion plans for autonomous vehicles. However, vanilla RL algorithms are based on random exploration and do not automatically comply with traffic rules. Our approach accomplishes guaranteed rule-compliance by integrating temporal logic specifications into RL. Specifically, we consider the application of vessels on the open sea, which must adhere to the Convention on the International Regulations for Preventing Collisions at Sea (COLREGS). To efficiently synthesize rule-compliant actions, we combine predicates based on set-based prediction with a statechart representing our formalized rules and their priorities. Action masking then restricts the RL agent to this set of verified rule-compliant actions. In numerical evaluations on critical maritime traffic situations, our agent always complies with the formalized legal rules and never collides while achieving a high goal-reaching rate during training and deployment. In contrast, vanilla and traffic rule-informed RL agents frequently violate traffic rules and collide even after training.
- [359] arXiv:2402.10104 (replaced) [pdf, ps, html, other]
-
Title: GeoEval: Benchmark for Evaluating LLMs and Multi-Modal Models on Geometry Problem-SolvingComments: Accepted in ACL 2024 FindingsSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Recent advancements in large language models (LLMs) and multi-modal models (MMs) have demonstrated their remarkable capabilities in problem-solving. Yet, their proficiency in tackling geometry math problems, which necessitates an integrated understanding of both textual and visual information, has not been thoroughly evaluated. To address this gap, we introduce the GeoEval benchmark, a comprehensive collection that includes a main subset of 2,000 problems, a 750 problems subset focusing on backward reasoning, an augmented subset of 2,000 problems, and a hard subset of 300 problems. This benchmark facilitates a deeper investigation into the performance of LLMs and MMs in solving geometry math problems. Our evaluation of ten LLMs and MMs across these varied subsets reveals that the WizardMath model excels, achieving a 55.67\% accuracy rate on the main subset but only a 6.00\% accuracy on the hard subset. This highlights the critical need for testing models against datasets on which they have not been pre-trained. Additionally, our findings indicate that GPT-series models perform more effectively on problems they have rephrased, suggesting a promising method for enhancing model capabilities.
- [360] arXiv:2402.11054 (replaced) [pdf, ps, html, other]
-
Title: Resource Allocation in Mobile Networks: A Decision Model Of Jockeying in QueuesComments: To appear in IEEE MetidCom 2024Subjects: Networking and Internet Architecture (cs.NI)
Use-case-specific network slicing in decentralized multi-tenancy cloud environments is a promising approach to bridge the gap between the demand and supply of resources in next-generation communication networks. Our findings associate different slice profiles to queues in a multi-server setting, such that tenants continuously assess their preferences and make rational decisions to minimize the queuing delay.
Deviated from classical approaches that statistically model the jockeying phenomena in queuing systems, our work pioneers to setup a behavioral model of jockeying impatient tenants. This will serve as a basis for decentralized management of multi-queue systems, where the decision to jockey is individually made by each tenant upon its up-to-date assessment of expected waiting time.
Additionally, we carry out numerical simulations to empirically unravel the parametric dependencies of the tenants' jockeying behavior. - [361] arXiv:2402.12025 (replaced) [pdf, ps, html, other]
-
Title: Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?Comments: Accepted to the ACL 2024 main conferenceSubjects: Computation and Language (cs.CL)
The field of natural language processing (NLP) has recently witnessed a transformative shift with the emergence of foundation models, particularly Large Language Models (LLMs) that have revolutionized text-based NLP. This paradigm has extended to other modalities, including speech, where researchers are actively exploring the combination of Speech Foundation Models (SFMs) and LLMs into single, unified models capable of addressing multimodal tasks. Among such tasks, this paper focuses on speech-to-text translation (ST). By examining the published papers on the topic, we propose a unified view of the architectural solutions and training strategies presented so far, highlighting similarities and differences among them. Based on this examination, we not only organize the lessons learned but also show how diverse settings and evaluation approaches hinder the identification of the best-performing solution for each architectural building block and training choice. Lastly, we outline recommendations for future works on the topic aimed at better understanding the strengths and weaknesses of the SFM+LLM solutions for ST.
- [362] arXiv:2402.12715 (replaced) [pdf, ps, html, other]
-
Title: Spurious Correlations in Machine Learning: A SurveyComments: Version 2; Github Link: this https URLSubjects: Machine Learning (cs.LG)
Machine learning systems are known to be sensitive to spurious correlations between non-essential features of the inputs (e.g., background, texture, and secondary objects) and the corresponding labels. These features and their correlations with the labels are known as "spurious" because they tend to change with shifts in real-world data distributions, which can negatively impact the model's generalization and robustness. In this paper, we provide a review of this issue, along with a taxonomy of current state-of-the-art methods for addressing spurious correlations in machine learning models. Additionally, we summarize existing datasets, benchmarks, and metrics to aid future research. The paper concludes with a discussion of the recent advancements and future challenges in this field, aiming to provide valuable insights for researchers in the related domains.
- [363] arXiv:2402.13457 (replaced) [pdf, ps, html, other]
-
Title: A Comprehensive Study of Jailbreak Attack versus Defense for Large Language ModelsComments: 18 pages, 9 figures, Accepted in ACL 2024Subjects: Cryptography and Security (cs.CR); Artificial Intelligence (cs.AI)
Large Language Models (LLMS) have increasingly become central to generating content with potential societal impacts. Notably, these models have demonstrated capabilities for generating content that could be deemed harmful. To mitigate these risks, researchers have adopted safety training techniques to align model outputs with societal values to curb the generation of malicious content. However, the phenomenon of "jailbreaking", where carefully crafted prompts elicit harmful responses from models, persists as a significant challenge. This research conducts a comprehensive analysis of existing studies on jailbreaking LLMs and their defense techniques. We meticulously investigate nine attack techniques and seven defense techniques applied across three distinct language models: Vicuna, LLama, and GPT-3.5 Turbo. We aim to evaluate the effectiveness of these attack and defense techniques. Our findings reveal that existing white-box attacks underperform compared to universal techniques and that including special tokens in the input significantly affects the likelihood of successful attacks. This research highlights the need to concentrate on the security facets of LLMs. Additionally, we contribute to the field by releasing our datasets and testing framework, aiming to foster further research into LLM security. We believe these contributions will facilitate the exploration of security measures within this domain.
- [364] arXiv:2402.14298 (replaced) [pdf, ps, html, other]
-
Title: Multi-modal Stance Detection: New Datasets and ModelSubjects: Computation and Language (cs.CL)
Stance detection is a challenging task that aims to identify public opinion from social media platforms with respect to specific targets. Previous work on stance detection largely focused on pure texts. In this paper, we study multi-modal stance detection for tweets consisting of texts and images, which are prevalent in today's fast-growing social media platforms where people often post multi-modal messages. To this end, we create five new multi-modal stance detection datasets of different domains based on Twitter, in which each example consists of a text and an image. In addition, we propose a simple yet effective Targeted Multi-modal Prompt Tuning framework (TMPT), where target information is leveraged to learn multi-modal stance features from textual and visual modalities. Experimental results on our three benchmark datasets show that the proposed TMPT achieves state-of-the-art performance in multi-modal stance detection.
- [365] arXiv:2402.15189 (replaced) [pdf, ps, html, other]
-
Title: Biomedical Entity Linking as Multiple Choice Question AnsweringComments: Accepted by COLING 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Although biomedical entity linking (BioEL) has made significant progress with pre-trained language models, challenges still exist for fine-grained and long-tailed entities. To address these challenges, we present BioELQA, a novel model that treats Biomedical Entity Linking as Multiple Choice Question Answering. BioELQA first obtains candidate entities with a fast retriever, jointly presents the mention and candidate entities to a generator, and then outputs the predicted symbol associated with its chosen entity. This formulation enables explicit comparison of different candidate entities, thus capturing fine-grained interactions between mentions and entities, as well as among entities themselves. To improve generalization for long-tailed entities, we retrieve similar labeled training instances as clues and concatenate the input with retrieved instances for the generator. Extensive experimental results show that BioELQA outperforms state-of-the-art baselines on several datasets.
- [366] arXiv:2402.17357 (replaced) [pdf, ps, html, other]
-
Title: A robust parameterized enhanced shift-splitting preconditioner for three-by-three block saddle point problemsSubjects: Numerical Analysis (math.NA)
This paper proposes a new parameterized enhanced shift-splitting (PESS) preconditioner to solve the three-by-three block saddle point problem (SPP). Additionally, we introduce a local PESS (LPESS) preconditioner by relaxing the PESS preconditioner. Necessary and sufficient criteria are established for the convergence of the proposed PESS iterative process for any random initial guess. Furthermore, we meticulously investigate the spectral bounds of the PESS and LPESS preconditioned matrices. Moreover, empirical investigations have been performed for the sensitivity analysis of the proposed PESS preconditioner, which unveils its robustness. Numerical experiments are carried out to demonstrate the enhanced efficiency and robustness of the proposed PESS and LPESS preconditioners compared to the existing block diagonal and shift-splitting preconditioners.
- [367] arXiv:2402.17614 (replaced) [pdf, ps, html, other]
-
Title: Adapt Before Comparison: A New Perspective on Cross-Domain Few-Shot SegmentationComments: accepted to CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Few-shot segmentation performance declines substantially when facing images from a domain different than the training domain, effectively limiting real-world use cases. To alleviate this, recently cross-domain few-shot segmentation (CD-FSS) has emerged. Works that address this task mainly attempted to learn segmentation on a source domain in a manner that generalizes across domains. Surprisingly, we can outperform these approaches while eliminating the training stage and removing their main segmentation network. We show test-time task-adaption is the key for successful CD-FSS instead. Task-adaption is achieved by appending small networks to the feature pyramid of a conventionally classification-pretrained backbone. To avoid overfitting to the few labeled samples in supervised fine-tuning, consistency across augmented views of input images serves as guidance while learning the parameters of the attached layers. Despite our self-restriction not to use any images other than the few labeled samples at test time, we achieve new state-of-the-art performance in CD-FSS, evidencing the need to rethink approaches for the task.
- [368] arXiv:2402.18016 (replaced) [pdf, ps, html, other]
-
Title: User Decision Guidance with Selective Explanation Presentation from Explainable-AISubjects: Human-Computer Interaction (cs.HC); Artificial Intelligence (cs.AI)
This paper addresses the challenge of selecting explanations for XAI (Explainable AI)-based Intelligent Decision Support Systems (IDSSs). IDSSs have shown promise in improving user decisions through XAI-generated explanations along with AI predictions, and the development of XAI made it possible to generate a variety of such explanations. However, how IDSSs should select explanations to enhance user decision-making remains an open question. This paper proposes X-Selector, a method for selectively presenting XAI explanations. It enables IDSSs to strategically guide users to an AI-suggested decision by predicting the impact of different combinations of explanations on a user's decision and selecting the combination that is expected to minimize the discrepancy between an AI suggestion and a user decision. We compared the efficacy of X-Selector with two naive strategies (all possible explanations and explanations only for the most likely prediction) and two baselines (no explanation and no AI support). The results suggest the potential of X-Selector to guide users to AI-suggested decisions and improve task performance under the condition of a high AI accuracy.
- [369] arXiv:2403.01009 (replaced) [pdf, ps, html, other]
-
Title: Design and Performance Evaluation of SEANet, a Software-defined Networking Platform for the Internet of Underwater ThingsComments: 14 pages, 18 figuresSubjects: Networking and Internet Architecture (cs.NI); Signal Processing (eess.SP)
This paper presents the design and performance evaluation of the SEANet platform, a software-defined acoustic modem designed for enhancing underwater networking and Internet of Underwater Things (IoUT) applications. Addressing the limitations of traditional acoustic modems, which suffer from low data rates and rigid architectures, SEANet introduces a versatile, adaptive framework capable of reconfiguring all layers of the protocol stack in real-time to accommodate diverse marine applications. The platform integrates high-performance, wideband data converters with modular hardware and software components, enabling real-time adaptation to changing environmental and operational conditions. Experimental evaluations conducted in oceanic settings demonstrate the SEANet capability to significantly exceed the performance of existing commercial underwater modems, supporting data rates up to 150 kbit/s and effectively doubling the performance metrics of conventional systems. Our robust testing also highlights the SEANet proficiency in channel estimation, Orthogonal Frequency-Division Multiplexing (OFDM) link establishment, and interoperability through the JANUS communication standard. Our results underscore the SEANet potential to transform underwater communication technologies, providing a scalable and efficient solution that supports high data rate applications and fosters the expansion of IoUT deployments.
- [370] arXiv:2403.02905 (replaced) [pdf, ps, html, other]
-
Title: MMoFusion: Multi-modal Co-Speech Motion Generation with Diffusion ModelSen Wang, Jiangning Zhang, Weijian Cao, Xiaobin Hu, Moran Li, Xiaozhong Ji, Xin Tan, Mengtian Li, Zhifeng Xie, Chengjie Wang, Lizhuang MaSubjects: Multimedia (cs.MM)
The body movements accompanying speech aid speakers in expressing their ideas. Co-speech motion generation is one of the important approaches for synthesizing realistic avatars. Due to the intricate correspondence between speech and motion, generating realistic and diverse motion is a challenging task. In this paper, we propose MMoFusion, a Multi-modal co-speech Motion generation framework based on the diffusion model to ensure both the authenticity and diversity of generated motion. We propose a progressive fusion strategy to enhance the interaction of inter-modal and intra-modal, efficiently integrating multi-modal information. Specifically, we employ a masked style matrix based on emotion and identity information to control the generation of different motion styles. Temporal modeling of speech and motion is partitioned into style-guided specific feature encoding and shared feature encoding, aiming to learn both inter-modal and intra-modal features. Besides, we propose a geometric loss to enforce the joints' velocity and acceleration coherence among frames. Our framework generates vivid, diverse, and style-controllable motion of arbitrary length through inputting speech and editing identity and emotion. Extensive experiments demonstrate that our method outperforms current co-speech motion generation methods including upper body and challenging full body.
- [371] arXiv:2403.05086 (replaced) [pdf, ps, html, other]
-
Title: UFORecon: Generalizable Sparse-View Surface Reconstruction from Arbitrary and UnFavOrable SetsComments: accepted at CVPR 2024 project page: this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
Generalizable neural implicit surface reconstruction aims to obtain an accurate underlying geometry given a limited number of multi-view images from unseen scenes. However, existing methods select only informative and relevant views using predefined scores for training and testing phases. This constraint renders the model impractical in real-world scenarios, where the availability of favorable combinations cannot always be ensured. We introduce and validate a view-combination score to indicate the effectiveness of the input view combination. We observe that previous methods output degenerate solutions under arbitrary and unfavorable sets. Building upon this finding, we propose UFORecon, a robust view-combination generalizable surface reconstruction framework. To achieve this, we apply cross-view matching transformers to model interactions between source images and build correlation frustums to capture global correlations. Additionally, we explicitly encode pairwise feature similarities as view-consistent priors. Our proposed framework significantly outperforms previous methods in terms of view-combination generalizability and also in the conventional generalizable protocol trained with favorable view-combinations. The code is available at this https URL.
- [372] arXiv:2403.06592 (replaced) [pdf, ps, html, other]
-
Title: Exploiting Style Latent Flows for Generalizing Deepfake Video DetectionComments: Preprint version, final version will be available at this https URL The IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR) (2024) Published by: IEEE & CVFSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
This paper presents a new approach for the detection of fake videos, based on the analysis of style latent vectors and their abnormal behavior in temporal changes in the generated videos. We discovered that the generated facial videos suffer from the temporal distinctiveness in the temporal changes of style latent vectors, which are inevitable during the generation of temporally stable videos with various facial expressions and geometric transformations. Our framework utilizes the StyleGRU module, trained by contrastive learning, to represent the dynamic properties of style latent vectors. Additionally, we introduce a style attention module that integrates StyleGRU-generated features with content-based features, enabling the detection of visual and temporal artifacts. We demonstrate our approach across various benchmark scenarios in deepfake detection, showing its superiority in cross-dataset and cross-manipulation scenarios. Through further analysis, we also validate the importance of using temporal changes of style latent vectors to improve the generality of deepfake video detection.
- [373] arXiv:2403.06668 (replaced) [pdf, ps, html, other]
-
Title: PeerAiD: Improving Adversarial Distillation from a Specialized Peer TutorComments: Accepted to CVPR 2024Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Adversarial robustness of the neural network is a significant concern when it is applied to security-critical domains. In this situation, adversarial distillation is a promising option which aims to distill the robustness of the teacher network to improve the robustness of a small student network. Previous works pretrain the teacher network to make it robust against the adversarial examples aimed at itself. However, the adversarial examples are dependent on the parameters of the target network. The fixed teacher network inevitably degrades its robustness against the unseen transferred adversarial examples which target the parameters of the student network in the adversarial distillation process. We propose PeerAiD to make a peer network learn the adversarial examples of the student network instead of adversarial examples aimed at itself. PeerAiD is an adversarial distillation that trains the peer network and the student network simultaneously in order to specialize the peer network for defending the student network. We observe that such peer networks surpass the robustness of the pretrained robust teacher model against adversarial examples aimed at the student network. With this peer network and adversarial distillation, PeerAiD achieves significantly higher robustness of the student network with AutoAttack (AA) accuracy by up to 1.66%p and improves the natural accuracy of the student network by up to 4.72%p with ResNet-18 on TinyImageNet dataset. Code is available at this https URL.
- [374] arXiv:2403.06765 (replaced) [pdf, ps, html, other]
-
Title: ConspEmoLLM: Conspiracy Theory Detection Using an Emotion-Based Large Language ModelComments: Work in progressSubjects: Computation and Language (cs.CL)
The internet has brought both benefits and harms to society. A prime example of the latter is misinformation, including conspiracy theories, which flood the web. Recent advances in natural language processing, particularly the emergence of large language models (LLMs), have improved the prospects of accurate misinformation detection. However, most LLM-based approaches to conspiracy theory detection focus only on binary classification and fail to account for the important relationship between misinformation and affective features (i.e., sentiment and emotions). Driven by a comprehensive analysis of conspiracy text that reveals its distinctive affective features, we propose ConspEmoLLM, the first open-source LLM that integrates affective information and is able to perform diverse tasks relating to conspiracy theories. These tasks include not only conspiracy theory detection, but also classification of theory type and detection of related discussion (e.g., opinions towards theories). ConspEmoLLM is fine-tuned based on an emotion-oriented LLM using our novel ConDID dataset, which includes five tasks to support LLM instruction tuning and evaluation. We demonstrate that when applied to these tasks, ConspEmoLLM largely outperforms several open-source general domain LLMs and ChatGPT, as well as an LLM that has been fine-tuned using ConDID, but which does not use affective features. This project will be released on this https URL.
- [375] arXiv:2403.08374 (replaced) [pdf, ps, html, other]
-
Title: Efficient Signature-Free Validated AgreementPierre Civit, Muhammad Ayaz Dzulfikar, Seth Gilbert, Rachid Guerraoui, Jovan Komatovic, Manuel Vidigueira, Igor ZablotchiSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
Byzantine agreement enables n processes to agree on a common L-bit value, despite up to t > 0 arbitrary failures. A long line of work has been dedicated to improving the bit complexity of Byzantine agreement in synchrony. This has culminated in COOL, an error-free (deterministically secure against a computationally unbounded adversary) solution that achieves O(nL + n^2 logn) worst-case bit complexity (which is optimal for L > n logn according to the Dolev-Reischuk lower bound). COOL satisfies strong validity: if all correct processes propose the same value, only that value can be decided.
Strong validity is, however, not appropriate for today's state machine replication (SMR) and blockchain protocols. These systems value progress and require a decided value to always be valid, excluding default decisions (such as EMPTY) even in cases where there is no unanimity a priori. Validated Byzantine agreement satisfies this property (called external validity). Yet, the best error-free (or even signature-free) validated agreement solutions achieve only O(n^2L) bit complexity, a far cry from the Omega(nL + n^2) Dolev-Reishcuk lower bound. In this paper, we present two new synchronous algorithms for validated Byzantine agreement, HashExt and ErrorFreeExt, with different trade-offs. Both algorithms are (1) signature-free, (2) optimally resilient (tolerate up to t < n / 3 failures), and (3) early-stopping (terminate in O(f+1) rounds, where f <= t is the actual number of failures). On the one hand, HashExt uses only hashes and achieves O(nL + n^3 kappa) bit complexity, which is optimal for L > n^2 kappa (where kappa is the size of a hash). On the other hand, ErrorFreeExt is error-free, using no cryptography whatsoever, and achieves O( (nL + n^2) logn ) bit complexity, which is near-optimal for any L. - [376] arXiv:2403.11678 (replaced) [pdf, ps, html, other]
-
Title: Exploring 3D-aware Latent Spaces for Efficiently Learning Numerous ScenesAntoine Schnepf, Karim Kassab, Jean-Yves Franceschi, Laurent Caraffa, Flavian Vasile, Jeremie Mary, Andrew Comport, Valérie Gouet-BrunetComments: Camera-ready version accepted at 3DMV-CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
We present a method enabling the scaling of NeRFs to learn a large number of semantically-similar scenes. We combine two techniques to improve the required training time and memory cost per scene. First, we learn a 3D-aware latent space in which we train Tri-Plane scene representations, hence reducing the resolution at which scenes are learned. Moreover, we present a way to share common information across scenes, hence allowing for a reduction of model complexity to learn a particular scene. Our method reduces effective per-scene memory costs by 44% and per-scene time costs by 86% when training 1000 scenes. Our project page can be found at this https URL .
- [377] arXiv:2403.11802 (replaced) [pdf, ps, html, other]
-
Title: Counting-Stars: A Multi-evidence, Position-aware, and Scalable Benchmark for Evaluating Long-Context Large Language ModelsComments: work in progressSubjects: Computation and Language (cs.CL)
While recent research endeavors have focused on developing Large Language Models (LLMs) with robust long-context capabilities, due to the lack of long-context benchmarks, relatively little is known about how well the performance of long-context LLMs. To address this gap, we propose a multi-evidence, position-aware, and scalable benchmark for evaluating long-context LLMs, named Counting-Stars, which evaluates long-context LLMs by using two tasks: multi-evidence acquisition and multi-evidence reasoning. Based on the Counting-Stars test, we conduct experiments to evaluate long-context LLMs (i.e., GPT-4 Turbo, Gemini 1.5 Pro, Claude3 Opus, GLM-4, and Moonshot-v1). Experimental results demonstrate that Gemini 1.5 Pro achieves the best overall results, while the performance of GPT-4 Turbo is the most stable across various tasks. Furthermore, our analysis of these LLMs, which are extended to handle long-context scenarios, indicates that there is potential for improvement as the length of the input context and the intricacy of the tasks are increasing.
- [378] arXiv:2403.12864 (replaced) [pdf, ps, html, other]
-
Title: A Comparison of Deep Learning Architectures for Spacecraft Anomaly DetectionComments: accepted for IEEE Aeroconf 2024. Final version published IEEE Aerospace Conference 2024 (AeroConf 2024), access in IEEE ExploreSubjects: Machine Learning (cs.LG)
Spacecraft operations are highly critical, demanding impeccable reliability and safety. Ensuring the optimal performance of a spacecraft requires the early detection and mitigation of anomalies, which could otherwise result in unit or mission failures. With the advent of deep learning, a surge of interest has been seen in leveraging these sophisticated algorithms for anomaly detection in space operations. This study aims to compare the efficacy of various deep learning architectures in detecting anomalies in spacecraft data. The deep learning models under investigation include Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short-Term Memory (LSTM) networks, and Transformer-based architectures. Each of these models was trained and validated using a comprehensive dataset sourced from multiple spacecraft missions, encompassing diverse operational scenarios and anomaly types. Initial results indicate that while CNNs excel in identifying spatial patterns and may be effective for some classes of spacecraft data, LSTMs and RNNs show a marked proficiency in capturing temporal anomalies seen in time-series spacecraft telemetry. The Transformer-based architectures, given their ability to focus on both local and global contexts, have showcased promising results, especially in scenarios where anomalies are subtle and span over longer durations. Additionally, considerations such as computational efficiency, ease of deployment, and real-time processing capabilities were evaluated. While CNNs and LSTMs demonstrated a balance between accuracy and computational demands, Transformer architectures, though highly accurate, require significant computational resources. In conclusion, the choice of deep learning architecture for spacecraft anomaly detection is highly contingent on the nature of the data, the type of anomalies, and operational constraints.
- [379] arXiv:2403.15593 (replaced) [pdf, ps, html, other]
-
Title: FairerCLIP: Debiasing CLIP's Zero-Shot Predictions using Functions in RKHSsComments: The Twelfth International Conference on Learning Representations (ICLR) 2024Subjects: Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Large pre-trained vision-language models such as CLIP provide compact and general-purpose representations of text and images that are demonstrably effective across multiple downstream zero-shot prediction tasks. However, owing to the nature of their training process, these models have the potential to 1) propagate or amplify societal biases in the training data and 2) learn to rely on spurious features. This paper proposes FairerCLIP, a general approach for making zero-shot predictions of CLIP more fair and robust to spurious correlations. We formulate the problem of jointly debiasing CLIP's image and text representations in reproducing kernel Hilbert spaces (RKHSs), which affords multiple benefits: 1) Flexibility: Unlike existing approaches, which are specialized to either learn with or without ground-truth labels, FairerCLIP is adaptable to learning in both scenarios. 2) Ease of Optimization: FairerCLIP lends itself to an iterative optimization involving closed-form solvers, which leads to $4\times$-$10\times$ faster training than the existing methods. 3) Sample Efficiency: Under sample-limited conditions, FairerCLIP significantly outperforms baselines when they fail entirely. And, 4) Performance: Empirically, FairerCLIP achieves appreciable accuracy gains on benchmark fairness and spurious correlation datasets over their respective baselines.
- [380] arXiv:2403.17101 (replaced) [pdf, ps, other]
-
Title: AI Consciousness is Inevitable: A Theoretical Computer Science PerspectiveSubjects: Artificial Intelligence (cs.AI)
We look at consciousness through the lens of Theoretical Computer Science, a branch of mathematics that studies computation under resource limitations. From this perspective, we develop a formal machine model for consciousness. The model is inspired by Alan Turing's simple yet powerful model of computation and Bernard Baars' theater model of consciousness. Though extremely simple, the model aligns at a high level with many of the major scientific theories of human and animal consciousness, supporting our claim that machine consciousness is inevitable.
- [381] arXiv:2403.19021 (replaced) [pdf, ps, html, other]
-
Title: IDGenRec: LLM-RecSys Alignment with Textual ID LearningComments: Accepted in SIGIR 2024Subjects: Information Retrieval (cs.IR); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Generative recommendation based on Large Language Models (LLMs) have transformed the traditional ranking-based recommendation style into a text-to-text generation paradigm. However, in contrast to standard NLP tasks that inherently operate on human vocabulary, current research in generative recommendations struggles to effectively encode recommendation items within the text-to-text framework using concise yet meaningful ID representations. To better align LLMs with recommendation needs, we propose IDGen, representing each item as a unique, concise, semantically rich, platform-agnostic textual ID using human language tokens. This is achieved by training a textual ID generator alongside the LLM-based recommender, enabling seamless integration of personalized recommendations into natural language generation. Notably, as user history is expressed in natural language and decoupled from the original dataset, our approach suggests the potential for a foundational generative recommendation model. Experiments show that our framework consistently surpasses existing models in sequential recommendation under standard experimental setting. Then, we explore the possibility of training a foundation recommendation model with the proposed method on data collected from 19 different datasets and tested its recommendation performance on 6 unseen datasets across different platforms under a completely zero-shot setting. The results show that the zero-shot performance of the pre-trained foundation model is comparable to or even better than some traditional recommendation models based on supervised training, showing the potential of the IDGen paradigm serving as the foundation model for generative recommendation. Code and data are open-sourced at this https URL.
- [382] arXiv:2404.00031 (replaced) [pdf, ps, html, other]
-
Title: Towards gaze-independent c-VEP BCI: A pilot studyComments: 6 pages, 3 figures, 9th Graz Brain-Computer Interface Conference 2024Subjects: Human-Computer Interaction (cs.HC); Machine Learning (cs.LG)
A limitation of brain-computer interface (BCI) spellers is that they require the user to be able to move the eyes to fixate on targets. This poses an issue for users who cannot voluntarily control their eye movements, for instance, people living with late-stage amyotrophic lateral sclerosis (ALS). This pilot study makes the first step towards a gaze-independent speller based on the code-modulated visual evoked potential (c-VEP). Participants were presented with two bi-laterally located stimuli, one of which was flashing, and were tasked to attend to one of these stimuli either by directly looking at the stimuli (overt condition) or by using spatial attention, eliminating the need for eye movement (covert condition). The attended stimuli were decoded from electroencephalography (EEG) and classification accuracies of 88% and 100% were obtained for the covert and overt conditions, respectively. These fundamental insights show the promising feasibility of utilizing the c-VEP protocol for gaze-independent BCIs that use covert spatial attention when both stimuli flash simultaneously.
- [383] arXiv:2404.01319 (replaced) [pdf, ps, html, other]
-
Title: Information Cascade Prediction under Public Emergencies: A SurveyComments: arXiv admin note: text overlap with arXiv:2007.09815 by other authorsSubjects: Social and Information Networks (cs.SI); Artificial Intelligence (cs.AI); Computers and Society (cs.CY)
With the advent of the era of big data, massive information, expert experience, and high-accuracy models bring great opportunities to the information cascade prediction of public emergencies. However, the involvement of specialist knowledge from various disciplines has resulted in a primarily application-specific focus (e.g., earthquakes, floods, infectious diseases) for information cascade prediction of public emergencies. The lack of a unified prediction framework poses a challenge for classifying intersectional prediction methods across different application fields. This survey paper offers a systematic classification and summary of information cascade modeling, prediction, and application. We aim to help researchers identify cutting-edge research and comprehend models and methods of information cascade prediction under public emergencies. By summarizing open issues and outlining future directions in this field, this paper has the potential to be a valuable resource for researchers conducting further studies on predicting information cascades.
- [384] arXiv:2404.01933 (replaced) [pdf, ps, html, other]
-
Title: PREGO: online mistake detection in PRocedural EGOcentric videosAlessandro Flaborea, Guido Maria D'Amely di Melendugno, Leonardo Plini, Luca Scofano, Edoardo De Matteis, Antonino Furnari, Giovanni Maria Farinella, Fabio GalassoComments: Accepted at CVPR 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Promptly identifying procedural errors from egocentric videos in an online setting is highly challenging and valuable for detecting mistakes as soon as they happen. This capability has a wide range of applications across various fields, such as manufacturing and healthcare. The nature of procedural mistakes is open-set since novel types of failures might occur, which calls for one-class classifiers trained on correctly executed procedures. However, no technique can currently detect open-set procedural mistakes online. We propose PREGO, the first online one-class classification model for mistake detection in PRocedural EGOcentric videos. PREGO is based on an online action recognition component to model the current action, and a symbolic reasoning module to predict the next actions. Mistake detection is performed by comparing the recognized current action with the expected future one. We evaluate PREGO on two procedural egocentric video datasets, Assembly101 and Epic-tent, which we adapt for online benchmarking of procedural mistake detection to establish suitable benchmarks, thus defining the Assembly101-O and Epic-tent-O datasets, respectively.
- [385] arXiv:2404.05484 (replaced) [pdf, ps, html, other]
-
Title: On Computational Modeling of Sleep-Wake CycleSubjects: Machine Learning (cs.LG); Machine Learning (stat.ML)
Why do mammals need to sleep? Neuroscience treats sleep and wake as default and perturbation modes of the brain. It is hypothesized that the brain self-organizes neural activities without environmental inputs. This paper presents a new computational model of the sleep-wake cycle (SWC) for learning and memory. During the sleep mode, the memory consolidation by the thalamocortical system is abstracted by a disentangling operator that maps context-dependent representations (CDR) to context-independent representations (CIR) for generalization. Such a disentangling operator can be mathematically formalized by an integral transform that integrates the context variable from CDR. During the wake mode, the memory formation by the hippocampal-neocortical system is abstracted by an entangling operator from CIR to CDR where the context is introduced by physical motion. When designed as inductive bias, entangled CDR linearizes the problem of unsupervised learning for sensory memory by direct-fit. The concatenation of disentangling and entangling operators forms a disentangling-entangling cycle (DEC) as the building block for sensorimotor learning. We also discuss the relationship of DEC and SWC to the perception-action cycle (PAC) for internal model learning and perceptual control theory for the ecological origin of natural languages.
- [386] arXiv:2404.05840 (replaced) [pdf, ps, html, other]
-
Title: Attention-Driven Multi-Agent Reinforcement Learning: Enhancing Decisions with Expertise-Informed TasksComments: This paper was published at Proceedings of FLAIRS-37, May 19-21, Sandestin Beach, FL. The proceedings version is available at this https URLSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Multiagent Systems (cs.MA)
In this paper, we introduce an alternative approach to enhancing Multi-Agent Reinforcement Learning (MARL) through the integration of domain knowledge and attention-based policy mechanisms. Our methodology focuses on the incorporation of domain-specific expertise into the learning process, which simplifies the development of collaborative behaviors. This approach aims to reduce the complexity and learning overhead typically associated with MARL by enabling agents to concentrate on essential aspects of complex tasks, thus optimizing the learning curve. The utilization of attention mechanisms plays a key role in our model. It allows for the effective processing of dynamic context data and nuanced agent interactions, leading to more refined decision-making. Applied in standard MARL scenarios, such as the Stanford Intelligent Systems Laboratory (SISL) Pursuit and Multi-Particle Environments (MPE) Simple Spread, our method has been shown to improve both learning efficiency and the effectiveness of collaborative behaviors. The results indicate that our attention-based approach can be a viable approach for improving the efficiency of MARL training process, integrating domain-specific knowledge at the action level.
- [387] arXiv:2404.09604 (replaced) [pdf, ps, html, other]
-
Title: Machine learning-based optimization workflow of the homogeneity of spunbond nonwovens with human validationSubjects: Machine Learning (cs.LG)
In the last ten years, the average annual growth rate of nonwoven production was 4%. In 2020 and 2021, nonwoven production has increased even further due to the huge demand for nonwoven products needed for protective clothing such as FFP2 masks to combat the COVID19 pandemic. Optimizing the production process is still a challenge due to its high nonlinearity. In this paper, we present a machine learning-based optimization workflow aimed at improving the homogeneity of spunbond nonwovens. The optimization workflow is based on a mathematical model that simulates the microstructures of nonwovens. Based on trainingy data coming from this simulator, different machine learning algorithms are trained in order to find a surrogate model for the time-consuming simulator. Human validation is employed to verify the outputs of machine learning algorithms by assessing the aesthetics of the nonwovens. We include scientific and expert knowledge into the training data to reduce the computational costs involved in the optimization process. We demonstrate the necessity and effectiveness of our workflow in optimizing the homogeneity of nonwovens.
- [388] arXiv:2404.10228 (replaced) [pdf, ps, html, other]
-
Title: Two-Stage Stance Labeling: User-Hashtag Heuristics with Graph Neural NetworksSubjects: Machine Learning (cs.LG); Computation and Language (cs.CL); Social and Information Networks (cs.SI)
The high volume and rapid evolution of content on social media present major challenges for studying the stance of social media users. In this work, we develop a two stage stance labeling method that utilizes the user-hashtag bipartite graph and the user-user interaction graph. In the first stage, a simple and efficient heuristic for stance labeling uses the user-hashtag bipartite graph to iteratively update the stance association of user and hashtag nodes via a label propagation mechanism. This set of soft labels is then integrated with the user-user interaction graph to train a graph neural network (GNN) model using semi-supervised learning. We evaluate this method on two large-scale datasets containing tweets related to climate change from June 2021 to June 2022 and gun control from January 2022 to January 2023. Our experiments demonstrate that enriching text-based embeddings of users with network information from the user interaction graph using our semi-supervised GNN method outperforms both classifiers trained on user textual embeddings and zero-shot classification using LLMs such as GPT4. We discuss the need for integrating nuanced understanding from social science with the scalability of computational methods to better understand how polarization on social media occurs for divisive issues such as climate change and gun control.
- [389] arXiv:2404.10401 (replaced) [pdf, ps, other]
-
Title: A Phone-based Distributed Ambient Temperature Measurement System with An Efficient Label-free Automated Training StrategyJournal-ref: IEEE Transactions on Mobile Computing,13 May 2024, 1 - 13Subjects: Machine Learning (cs.LG)
Enhancing the energy efficiency of buildings significantly relies on monitoring indoor ambient temperature. The potential limitations of conventional temperature measurement techniques, together with the omnipresence of smartphones, have redirected researchers'attention towards the exploration of phone-based ambient temperature estimation methods. However, existing phone-based methods face challenges such as insufficient privacy protection, difficulty in adapting models to various phones, and hurdles in obtaining enough labeled training data. In this study, we propose a distributed phone-based ambient temperature estimation system which enables collaboration among multiple phones to accurately measure the ambient temperature in different areas of an indoor space. This system also provides an efficient, cost-effective approach with a few-shot meta-learning module and an automated label generation module. It shows that with just 5 new training data points, the temperature estimation model can adapt to a new phone and reach a good performance. Moreover, the system uses crowdsourcing to generate accurate labels for all newly collected training data, significantly reducing costs. Additionally, we highlight the potential of incorporating federated learning into our system to enhance privacy protection. We believe this study can advance the practical application of phone-based ambient temperature measurement, facilitating energy-saving efforts in buildings.
- [390] arXiv:2404.11205 (replaced) [pdf, ps, html, other]
-
Title: Pose2Gest: A Few-Shot Model-Free Approach Applied In South Indian Classical Dance Gesture RecognitionSubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
The classical dances from India utilize a set of hand gestures known as Mudras, serving as the foundational elements of its posture vocabulary. Identifying these mudras represents a primary task in digitizing the dance performances. With Kathakali, a dance-drama, as the focus, this work addresses mudra recognition by framing it as a 24-class classification problem and proposes a novel vector-similarity-based approach leveraging pose estimation techniques. This method obviates the need for extensive training or fine-tuning, thus mitigating the issue of limited data availability common in similar AI applications. Achieving an accuracy rate of 92%, our approach demonstrates comparable or superior performance to existing model-training-based methodologies in this domain. Notably, it remains effective even with small datasets comprising just 1 or 5 samples, albeit with a slightly diminished performance. Furthermore, our system supports processing images, videos, and real-time streams, accommodating both hand-cropped and full-body images. As part of this research, we have curated and released a publicly accessible Hasta Mudra dataset, which applies to multiple South Indian art forms including Kathakali. The implementation of the proposed method is also made available as a web application.
- [391] arXiv:2404.11671 (replaced) [pdf, ps, other]
-
Title: A Study of Undefined Behavior Across Foreign Function Boundaries in Rust LibrariesComments: 26 pages without appendix supplement, preprintSubjects: Software Engineering (cs.SE)
The Rust programming language restricts aliasing and mutability to provide static safety guarantees, which developers rely on to write secure and performant applications. However, Rust is frequently used to interoperate with other languages that have far weaker restrictions. These languages support cyclic and self-referential design patterns that conflict with current models of Rust's operational semantics, representing a potentially significant source of undefined behavior that no current tools can detect. We created MiriLLI, a tool which uses existing Rust and LLVM interpreters to jointly execute multi-language Rust applications. We used our tool in a large-scale study of Rust libraries that call foreign functions, and we found 45 instances of undefined or undesirable behavior. These include four bugs from libraries that had over 10,000 daily downloads on average, one from a component of the GNU Compiler Collection (GCC), and one from a library maintained by the Rust Project. Most of these errors were caused by incompatible aliasing and initialization patterns, incorrect foreign function bindings, and invalid type conversion. The majority of aliasing violations were caused by unsound operations in Rust, but they occurred in foreign code. The Rust community must invest in new tools for validating multi-language programs to ensure that developers can easily detect and fix these errors.
- [392] arXiv:2404.13859 (replaced) [pdf, ps, html, other]
-
Title: Unveiling and Mitigating Generalized Biases of DNNs through the Intrinsic Dimensions of Perceptual ManifoldsComments: 8pages, 6figures, Submitted to TPAMISubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Building fair deep neural networks (DNNs) is a crucial step towards achieving trustworthy artificial intelligence. Delving into deeper factors that affect the fairness of DNNs is paramount and serves as the foundation for mitigating model biases. However, current methods are limited in accurately predicting DNN biases, relying solely on the number of training samples and lacking more precise measurement tools. Here, we establish a geometric perspective for analyzing the fairness of DNNs, comprehensively exploring how DNNs internally shape the intrinsic geometric characteristics of datasets-the intrinsic dimensions (IDs) of perceptual manifolds, and the impact of IDs on the fairness of DNNs. Based on multiple findings, we propose Intrinsic Dimension Regularization (IDR), which enhances the fairness and performance of models by promoting the learning of concise and ID-balanced class perceptual manifolds. In various image recognition benchmark tests, IDR significantly mitigates model bias while improving its performance.
- [393] arXiv:2404.14965 (replaced) [pdf, ps, html, other]
-
Title: Vision Beyond Boundaries: An Initial Design Space of Domain-specific Large Vision Models in Human-robot InteractionSubjects: Human-Computer Interaction (cs.HC); Robotics (cs.RO)
The emergence of Large Vision Models (LVMs) is following in the footsteps of the recent prosperity of Large Language Models (LLMs) in following years. However, there's a noticeable gap in structured research applying LVMs to Human-Robot Interaction (HRI), despite extensive evidence supporting the efficacy of vision models in enhancing interactions between humans and robots. Recognizing the vast and anticipated potential, we introduce an initial design space that incorporates domain-specific LVMs, chosen for their superior performance over normal models. We delve into three primary dimensions: HRI contexts, vision-based tasks, and specific domains. The empirical validation was implemented among 15 experts across six evaluated metrics, showcasing the primary efficacy in relevant decision-making scenarios. We explore the process of ideation and potential application scenarios, envisioning this design space as a foundational guideline for future HRI system design, emphasizing accurate domain alignment and model selection.
- [394] arXiv:2404.15193 (replaced) [pdf, ps, html, other]
-
Title: Structurally Flexible Neural Networks: Evolving the Building Blocks for General AgentsSubjects: Neural and Evolutionary Computing (cs.NE); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Artificial neural networks used for reinforcement learning are structurally rigid, meaning that each optimized parameter of the network is tied to its specific placement in the network structure. It also means that a network only works with pre-defined and fixed input- and output sizes. This is a consequence of having the number of optimized parameters being directly dependent on the structure of the network. Structural rigidity limits the ability to optimize parameters of policies across multiple environments that do not share input and output spaces. Here, we evolve a set of neurons and plastic synapses each represented by a gated recurrent unit (GRU). During optimization, the parameters of these fundamental units of a neural network are optimized in different random structural configurations. Earlier work has shown that parameter sharing between units is important for making structurally flexible neurons We show that it is possible to optimize a set of distinct neuron- and synapse types allowing for a mitigation of the symmetry dilemma. We demonstrate this by optimizing a single set of neurons and synapses to solve multiple reinforcement learning control tasks simultaneously.
- [395] arXiv:2404.15772 (replaced) [pdf, ps, html, other]
-
Title: Bi-Mamba+: Bidirectional Mamba for Time Series ForecastingComments: New Mamba-based architecture. All experiments rerunSubjects: Machine Learning (cs.LG)
Long-term time series forecasting (LTSF) provides longer insights into future trends and patterns. Over the past few years, deep learning models especially Transformers have achieved advanced performance in LTSF tasks. However, LTSF faces inherent challenges such as long-term dependencies capturing and sparse semantic characteristics. Recently, a new state space model (SSM) named Mamba is proposed. With the selective capability on input data and the hardware-aware parallel computing algorithm, Mamba has shown great potential in balancing predicting performance and computational efficiency compared to Transformers. To enhance Mamba's ability to preserve historical information in a longer range, we design a novel Mamba+ block by adding a forget gate inside Mamba to selectively combine the new features with the historical features in a complementary manner. Furthermore, we apply Mamba+ both forward and backward and propose Bi-Mamba+, aiming to promote the model's ability to capture interactions among time series elements. Additionally, multivariate time series data in different scenarios may exhibit varying emphasis on intra- or inter-series dependencies. Therefore, we propose a series-relation-aware decider that controls the utilization of channel-independent or channel-mixing tokenization strategy for specific datasets. Extensive experiments on 8 real-world datasets show that our model achieves more accurate predictions compared with state-of-the-art methods.
- [396] arXiv:2404.16061 (replaced) [pdf, ps, html, other]
-
Title: Dynamic Many Valued Logic Systems in Theoretical EconomicsSubjects: Logic in Computer Science (cs.LO); Theoretical Economics (econ.TH)
This paper is an original attempt to understand the foundations of economic reasoning. It endeavors to rigorously define the relationship between subjective interpretations and objective valuations of such interpretations in the context of theoretical economics. This analysis is substantially expanded through a dynamic approach, where the truth of a valuation results in an updated interpretation or changes in the agent's subjective belief regarding the effectiveness of the selected action as well as the objective reality of the effectiveness of all other possible actions (i.e. consequence realization). Complications arise when the economic agent is presented with a set of actions that render ambiguous preference, or when the effectiveness of an action cannot be perceived upon its selection, thereby necessitating a different theory of choice and consequence realization.
- [397] arXiv:2404.16267 (replaced) [pdf, ps, html, other]
-
Title: Dynamic PageRank: Algorithms and Lower BoundsSubjects: Data Structures and Algorithms (cs.DS)
We consider the PageRank problem in the dynamic setting, where the goal is to explicitly maintain an approximate PageRank vector $\pi \in \mathbb{R}^n$ for a graph under a sequence of edge insertions and deletions. Our main result is a complete characterization of the complexity of dynamic PageRank maintenance for both multiplicative and additive ($L_1$) approximations.
First, we establish matching lower and upper bounds for maintaining additive approximate PageRank in both incremental and decremental settings. In particular, we demonstrate that in the worst-case $(1/\alpha)^{\Theta(\log \log n)}$ update time is necessary and sufficient for this problem, where $\alpha$ is the desired additive approximation. On the other hand, we demonstrate that the commonly employed ForwardPush approach performs substantially worse than this optimal runtime. Specifically, we show that ForwardPush requires $\Omega(n^{1-\delta})$ time per update on average, for any $\delta > 0$, even in the incremental setting.
For multiplicative approximations, however, we demonstrate that the situation is significantly more challenging. Specifically, we prove that any algorithm that explicitly maintains a constant factor multiplicative approximation of the PageRank vector of a directed graph must have amortized update time $\Omega(n^{1-\delta})$, for any $\delta > 0$, even in the incremental setting, thereby resolving a 13-year old open question of Bahmani et al.~(VLDB 2010). This sharply contrasts with the undirected setting, where we show that $\rm{poly}\ \log n$ update time is feasible, even in the fully dynamic setting under oblivious adversary. - [398] arXiv:2404.16296 (replaced) [pdf, ps, other]
-
Title: Research on Splicing Image Detection Algorithms Based on Natural Image Statistical CharacteristicsSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
With the development and widespread application of digital image processing technology, image splicing has become a common method of image manipulation, raising numerous security and legal issues. This paper introduces a new splicing image detection algorithm based on the statistical characteristics of natural images, aimed at improving the accuracy and efficiency of splicing image detection. By analyzing the limitations of traditional methods, we have developed a detection framework that integrates advanced statistical analysis techniques and machine learning methods. The algorithm has been validated using multiple public datasets, showing high accuracy in detecting spliced edges and locating tampered areas, as well as good robustness. Additionally, we explore the potential applications and challenges faced by the algorithm in real-world scenarios. This research not only provides an effective technological means for the field of image tampering detection but also offers new ideas and methods for future related research.
- [399] arXiv:2404.17027 (replaced) [pdf, ps, html, other]
-
Title: Player-Driven Emergence in LLM-Driven Game NarrativeXiangyu Peng, Jessica Quaye, Weijia Xu, Portia Botchway, Chris Brockett, Bill Dolan, Nebojsa Jojic, Gabriel DesGarennes, Ken Lobb, Michael Xu, Jorge Leandro, Claire Jin, Sudha RaoJournal-ref: IEEE Conference on Games 2024Subjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
We explore how interaction with large language models (LLMs) can give rise to emergent behaviors, empowering players to participate in the evolution of game narratives. Our testbed is a text-adventure game in which players attempt to solve a mystery under a fixed narrative premise, but can freely interact with non-player characters generated by GPT-4, a large language model. We recruit 28 gamers to play the game and use GPT-4 to automatically convert the game logs into a node-graph representing the narrative in the player's gameplay. We find that through their interactions with the non-deterministic behavior of the LLM, players are able to discover interesting new emergent nodes that were not a part of the original narrative but have potential for being fun and engaging. Players that created the most emergent nodes tended to be those that often enjoy games that facilitate discovery, exploration and experimentation.
- [400] arXiv:2404.18348 (replaced) [pdf, ps, other]
-
Title: Bilinear optimal control for the Stokes-Brinkman equations: a priori and a posteriori error analysesSubjects: Numerical Analysis (math.NA); Optimization and Control (math.OC)
We analyze a bilinear optimal control problem for the Stokes--Brinkman equations: the control variable enters the state equations as a coefficient. In two- and three-dimensional Lipschitz domains, we perform a complete continuous analysis that includes the existence of solutions and first- and second-order optimality conditions. We also develop two finite element methods that differ fundamentally in whether the admissible control set is discretized or not. For each of the proposed methods, we perform a convergence analysis and derive a priori error estimates; the latter under the assumption that the domain is convex. Finally, assuming that the domain is Lipschitz, we develop an a posteriori error estimator for each discretization scheme and obtain a global reliability bound.
- [401] arXiv:2405.00820 (replaced) [pdf, ps, html, other]
-
Title: HLSFactory: A Framework Empowering High-Level Synthesis Datasets for Machine Learning and BeyondStefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong HaoComments: Edit to "Section V.E" for proper attribution of open-source HLSyn, AutoDSE, and the Merlin compilerSubjects: Hardware Architecture (cs.AR); Machine Learning (cs.LG)
Machine learning (ML) techniques have been applied to high-level synthesis (HLS) flows for quality-of-result (QoR) prediction and design space exploration (DSE). Nevertheless, the scarcity of accessible high-quality HLS datasets and the complexity of building such datasets present challenges. Existing datasets have limitations in terms of benchmark coverage, design space enumeration, vendor extensibility, or lack of reproducible and extensible software for dataset construction. Many works also lack user-friendly ways to add more designs, limiting wider adoption of such datasets.
In response to these challenges, we introduce HLSFactory, a comprehensive framework designed to facilitate the curation and generation of high-quality HLS design datasets. HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage. This tripartite architecture ensures broad design space coverage via design space expansion and supports multiple vendor tools. Users can contribute to each stage with their own HLS designs and synthesis results and extend the framework itself with custom frontends and tool flows. We also include an initial set of built-in designs from common HLS benchmarks curated open-source HLS designs.
We showcase the versatility and multi-functionality of our framework through six case studies: I) Design space sampling; II) Fine-grained parallelism backend speedup; III) Targeting Intel's HLS flow; IV) Adding new auxiliary designs; V) Integrating published HLS data; VI) HLS tool version regression benchmarking.
Code at this https URL. - [402] arXiv:2405.00942 (replaced) [pdf, ps, html, other]
-
Title: LLaVA Finds Free Lunch: Teaching Human Behavior Improves Content Understanding Abilities Of LLMsSomesh Singh, Harini S I, Yaman K Singla, Veeky Baths, Rajiv Ratn Shah, Changyou Chen, Balaji KrishnamurthySubjects: Computer Vision and Pattern Recognition (cs.CV); Computation and Language (cs.CL)
Communication is defined as "Who says what to whom with what effect." A message from a communicator generates downstream receiver effects, also known as behavior. Receiver behavior, being a downstream effect of the message, carries rich signals about it. Even after carrying signals about the message, the behavior data is often ignored while training large language models. We show that training LLMs on receiver behavior can actually help improve their content-understanding abilities. Specifically, we show that training LLMs to predict the receiver behavior of likes and comments improves the LLM's performance on a wide variety of downstream content understanding tasks. We show this performance increase over 40 video and image understanding tasks over 23 benchmark datasets across both 0-shot and fine-tuning settings, outperforming many supervised baselines. Moreover, since receiver behavior, such as likes and comments, is collected by default on the internet and does not need any human annotations to be useful, the performance improvement we get after training on this data is essentially free-lunch. We release the receiver behavior cleaned comments and likes of 750k images and videos collected from multiple platforms along with our instruction-tuning data.
- [403] arXiv:2405.01389 (replaced) [pdf, ps, html, other]
-
Title: Invariant Risk Minimization Is A Total Variation ModelComments: ICML 2024Subjects: Machine Learning (cs.LG)
Invariant risk minimization (IRM) is an arising approach to generalize invariant features to different environments in machine learning. While most related works focus on new IRM settings or new application scenarios, the mathematical essence of IRM remains to be properly explained. We verify that IRM is essentially a total variation based on $L^2$ norm (TV-$\ell_2$) of the learning risk with respect to the classifier variable. Moreover, we propose a novel IRM framework based on the TV-$\ell_1$ model. It not only expands the classes of functions that can be used as the learning risk and the feature extractor, but also has robust performance in denoising and invariant feature preservation based on the coarea formula. We also illustrate some requirements for IRM-TV-$\ell_1$ to achieve out-of-distribution generalization. Experimental results show that the proposed framework achieves competitive performance in several benchmark machine learning scenarios.
- [404] arXiv:2405.02288 (replaced) [pdf, ps, html, other]
-
Title: Prospective Role of Foundation Models in Advancing Autonomous VehiclesJianhua Wu, Bingzhao Gao, Jincheng Gao, Jianhao Yu, Hongqing Chu, Qiankun Yu, Xun Gong, Yi Chang, H. Eric Tseng, Hong Chen, Jie ChenComments: 45 pages,8 figuresSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Robotics (cs.RO)
With the development of artificial intelligence and breakthroughs in deep learning, large-scale Foundation Models (FMs), such as GPT, Sora, etc., have achieved remarkable results in many fields including natural language processing and computer vision. The application of FMs in autonomous driving holds considerable promise. For example, they can contribute to enhancing scene understanding and reasoning. By pre-training on rich linguistic and visual data, FMs can understand and interpret various elements in a driving scene, and provide cognitive reasoning to give linguistic and action instructions for driving decisions and planning. Furthermore, FMs can augment data based on the understanding of driving scenarios to provide feasible scenes of those rare occurrences in the long tail distribution that are unlikely to be encountered during routine driving and data collection. The enhancement can subsequently lead to improvement in the accuracy and reliability of autonomous driving systems. Another testament to the potential of FMs' applications lies in World Models, exemplified by the DREAMER series, which showcases the ability to comprehend physical laws and dynamics. Learning from massive data under the paradigm of self-supervised learning, World Model can generate unseen yet plausible driving environments, facilitating the enhancement in the prediction of road users' behaviors and the off-line training of driving strategies. In this paper, we synthesize the applications and future trends of FMs in autonomous driving. By utilizing the powerful capabilities of FMs, we strive to tackle the potential issues stemming from the long-tail distribution in autonomous driving, consequently advancing overall safety in this domain.
- [405] arXiv:2405.02385 (replaced) [pdf, ps, html, other]
-
Title: Efficient Deep Learning with Decorrelated BackpropagationSubjects: Machine Learning (cs.LG)
The backpropagation algorithm remains the dominant and most successful method for training deep neural networks (DNNs). At the same time, training DNNs at scale comes at a significant computational cost and therefore a high carbon footprint. Converging evidence suggests that input decorrelation may speed up deep learning. However, to date, this has not yet translated into substantial improvements in training efficiency in large-scale DNNs. This is mainly caused by the challenge of enforcing fast and stable network-wide decorrelation. Here, we show for the first time that much more efficient training of very deep neural networks using decorrelated backpropagation is feasible. To achieve this goal we made use of a novel algorithm which induces network-wide input decorrelation using minimal computational overhead. By combining this algorithm with careful optimizations, we obtain a more than two-fold speed-up and higher test accuracy compared to backpropagation when training a 18-layer deep residual network. This demonstrates that decorrelation provides exciting prospects for efficient deep learning at scale.
- [406] arXiv:2405.02508 (replaced) [pdf, ps, html, other]
-
Title: Rasterized Edge Gradients: Handling Discontinuities DifferentiablySubjects: Computer Vision and Pattern Recognition (cs.CV); Graphics (cs.GR)
Computing the gradients of a rendering process is paramount for diverse applications in computer vision and graphics. However, accurate computation of these gradients is challenging due to discontinuities and rendering approximations, particularly for surface-based representations and rasterization-based rendering. We present a novel method for computing gradients at visibility discontinuities for rasterization-based differentiable renderers. Our method elegantly simplifies the traditionally complex problem through a carefully designed approximation strategy, allowing for a straightforward, effective, and performant solution. We introduce a novel concept of micro-edges, which allows us to treat the rasterized images as outcomes of a differentiable, continuous process aligned with the inherently non-differentiable, discrete-pixel rasterization. This technique eliminates the necessity for rendering approximations or other modifications to the forward pass, preserving the integrity of the rendered image, which makes it applicable to rasterized masks, depth, and normals images where filtering is prohibitive. Utilizing micro-edges simplifies gradient interpretation at discontinuities and enables handling of geometry intersections, offering an advantage over the prior art. We showcase our method in dynamic human head scene reconstruction, demonstrating effective handling of camera images and segmentation masks.
- [407] arXiv:2405.02924 (replaced) [pdf, ps, html, other]
-
Title: Optimal Sampling for Uncertainty-of-Information Minimization in a Remote Monitoring SystemSubjects: Information Theory (cs.IT)
In this paper, we study a remote monitoring system where a receiver observes a remote binary Markov source and decides whether to sample and transmit the state through a randomly delayed channel. We adopt uncertainty of information (UoI), defined as the entropy conditional on past observations at the receiver, as a metric of value of information, in contrast to the traditional state-agnostic nonlinear age of information (AoI) penalty functions. To address the limitations of prior UoI research that assumes one-time-slot delays, we extend our analysis to scenarios with random delays. We model the problem as a partially observable Markov decision process (POMDP) problem and simplify it to a semi-Markov decision process (SMDP) by introducing the belief state. We propose two algorithms: A globally optimal bisection relative value iteration (bisec-RVI) algorithm and a computationally efficient sub-optimal index-based threshold algorithm to solve the long-term average UoI minimization problem. Numerical simulations demonstrate that our sampling policies surpass traditional zero wait and AoI-optimal policies, particularly under conditions of large delay, with the sub-optimal policy nearly matching the performance of the optimal one.
- [408] arXiv:2405.03342 (replaced) [pdf, ps, html, other]
-
Title: Doubly Robust Causal Effect Estimation under Networked Interference via Targeted LearningComments: Accepted by ICML 2024Subjects: Machine Learning (cs.LG)
Causal effect estimation under networked interference is an important but challenging problem. Available parametric methods are limited in their model space, while previous semiparametric methods, e.g., leveraging neural networks to fit only one single nuisance function, may still encounter misspecification problems under networked interference without appropriate assumptions on the data generation process. To mitigate bias stemming from misspecification, we propose a novel doubly robust causal effect estimator under networked interference, by adapting the targeted learning technique to the training of neural networks. Specifically, we generalize the targeted learning technique into the networked interference setting and establish the condition under which an estimator achieves double robustness. Based on the condition, we devise an end-to-end causal effect estimator by transforming the identified theoretical condition into a targeted loss. Moreover, we provide a theoretical analysis of our designed estimator, revealing a faster convergence rate compared to a single nuisance model. Extensive experimental results on two real-world networks with semisynthetic data demonstrate the effectiveness of our proposed estimators.
- [409] arXiv:2405.04407 (replaced) [pdf, ps, html, other]
-
Title: Super-Exponential Regret for UCT, AlphaGo and VariantsSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
We improve the proofs of the lower bounds of Coquelin and Munos (2007) that demonstrate that UCT can have $\exp(\dots\exp(1)\dots)$ regret (with $\Omega(D)$ exp terms) on the $D$-chain environment, and that a `polynomial' UCT variant has $\exp_2(\exp_2(D - O(\log D)))$ regret on the same environment -- the original proofs contain an oversight for rewards bounded in $[0, 1]$, which we fix in the present draft. We also adapt the proofs to AlphaGo's MCTS and its descendants (e.g., AlphaZero, Leela Zero) to also show $\exp_2(\exp_2(D - O(\log D)))$ regret.
- [410] arXiv:2405.04903 (replaced) [pdf, ps, html, other]
-
Title: Imbalanced Graph Classification with Multi-scale Oversampling Graph Neural NetworksSubjects: Machine Learning (cs.LG)
One main challenge in imbalanced graph classification is to learn expressive representations of the graphs in under-represented (minority) classes. Existing generic imbalanced learning methods, such as oversampling and imbalanced learning loss functions, can be adopted for enabling graph representation learning models to cope with this challenge. However, these methods often directly operate on the graph representations, ignoring rich discriminative information within the graphs and their interactions. To tackle this issue, we introduce a novel multi-scale oversampling graph neural network (MOSGNN) that learns expressive minority graph representations based on intra- and inter-graph semantics resulting from oversampled graphs at multiple scales - subgraph, graph, and pairwise graphs. It achieves this by jointly optimizing subgraph-level, graph-level, and pairwise-graph learning tasks to learn the discriminative information embedded within and between the minority graphs. Extensive experiments on 16 imbalanced graph datasets show that MOSGNN i) significantly outperforms five state-of-the-art models, and ii) offers a generic framework, in which different advanced imbalanced learning loss functions can be easily plugged in and obtain significantly improved classification performance.
- [411] arXiv:2405.05170 (replaced) [pdf, ps, html, other]
-
Title: Picking watermarks from noise (PWFN): an improved robust watermarking model against intensive distortionsSubjects: Multimedia (cs.MM); Computer Vision and Pattern Recognition (cs.CV); Image and Video Processing (eess.IV)
Digital watermarking is the process of embedding secret information by altering images in an undetectable way to the human eye. To increase the robustness of the model, many deep learning-based watermarking methods use the encoder-noise-decoder architecture by adding different noises to the noise layer. The decoder then extracts the watermarked information from the distorted image. However, this method can only resist weak noise attacks. To improve the robustness of the decoder against stronger noise, this paper proposes to introduce a denoise module between the noise layer and the decoder. The module aims to reduce noise and recover some of the information lost caused by distortion. Additionally, the paper introduces the SE module to fuse the watermarking information pixel-wise and channel dimensions-wise, improving the encoder's efficiency. Experimental results show that our proposed method is comparable to existing models and outperforms state-of-the-art under different noise intensities. In addition, ablation experiments show the superiority of our proposed module.
- [412] arXiv:2405.05886 (replaced) [pdf, ps, html, other]
-
Title: Exploiting Autoencoder's Weakness to Generate Pseudo AnomaliesComments: SharedIt link: this https URLJournal-ref: Neural Computing and Applications, pp.1-17 (2024)Subjects: Machine Learning (cs.LG); Computer Vision and Pattern Recognition (cs.CV)
Due to the rare occurrence of anomalous events, a typical approach to anomaly detection is to train an autoencoder (AE) with normal data only so that it learns the patterns or representations of the normal training data. At test time, the trained AE is expected to well reconstruct normal but to poorly reconstruct anomalous data. However, contrary to the expectation, anomalous data is often well reconstructed as well. In order to further separate the reconstruction quality between normal and anomalous data, we propose creating pseudo anomalies from learned adaptive noise by exploiting the aforementioned weakness of AE, i.e., reconstructing anomalies too well. The generated noise is added to the normal data to create pseudo anomalies. Extensive experiments on Ped2, Avenue, ShanghaiTech, CIFAR-10, and KDDCUP datasets demonstrate the effectiveness and generic applicability of our approach in improving the discriminative capability of AEs for anomaly detection.
- [413] arXiv:2405.06488 (replaced) [pdf, ps, html, other]
-
Title: Can Neural Networks learn Finite Elements?Subjects: Numerical Analysis (math.NA)
The aim of this note is to construct a neural network for which the linear finite element approximation of a simple one dimensional boundary value problem is a minimum of the cost function to find out if the neural network is able to reproduce the finite element approximation. The deepest goal is to shed some light on the problems one encounters when trying to use neural networks to approximate partial differential equations
- [414] arXiv:2405.06624 (replaced) [pdf, ps, html, other]
-
Title: Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI SystemsDavid "davidad" Dalrymple, Joar Skalse, Yoshua Bengio, Stuart Russell, Max Tegmark, Sanjit Seshia, Steve Omohundro, Christian Szegedy, Ben Goldhaber, Nora Ammann, Alessandro Abate, Joe Halpern, Clark Barrett, Ding Zhao, Tan Zhi-Xuan, Jeannette Wing, Joshua TenenbaumSubjects: Artificial Intelligence (cs.AI)
Ensuring that AI systems reliably and robustly avoid harmful or dangerous behaviours is a crucial challenge, especially for AI systems with a high degree of autonomy and general intelligence, or systems used in safety-critical contexts. In this paper, we will introduce and define a family of approaches to AI safety, which we will refer to as guaranteed safe (GS) AI. The core feature of these approaches is that they aim to produce AI systems which are equipped with high-assurance quantitative safety guarantees. This is achieved by the interplay of three core components: a world model (which provides a mathematical description of how the AI system affects the outside world), a safety specification (which is a mathematical description of what effects are acceptable), and a verifier (which provides an auditable proof certificate that the AI satisfies the safety specification relative to the world model). We outline a number of approaches for creating each of these three core components, describe the main technical challenges, and suggest a number of potential solutions to them. We also argue for the necessity of this approach to AI safety, and for the inadequacy of the main alternative approaches.
- [415] arXiv:2405.07291 (replaced) [pdf, ps, html, other]
-
Title: Robust Beamforming with Gradient-based Liquid Neural NetworkXinquan Wang, Fenghao Zhu, Chongwen Huang, Ahmed Alhammadi, Faouzi Bader, Zhaoyang Zhang, Chau Yuen, Merouane DebbahSubjects: Information Theory (cs.IT); Signal Processing (eess.SP)
Millimeter-wave (mmWave) multiple-input multiple-output (MIMO) communication with the advanced beamforming technologies is a key enabler to meet the growing demands of future mobile communication. However, the dynamic nature of cellular channels in large-scale urban mmWave MIMO communication scenarios brings substantial challenges, particularly in terms of complexity and robustness. To address these issues, we propose a robust gradient-based liquid neural network (GLNN) framework that utilizes ordinary differential equation-based liquid neurons to solve the beamforming problem. Specifically, our proposed GLNN framework takes gradients of the optimization objective function as inputs to extract the high-order channel feature information, and then introduces a residual connection to mitigate the training burden. Furthermore, we use the manifold learning technique to compress the search space of the beamforming problem. These designs enable the GLNN to effectively maintain low complexity while ensuring strong robustness to noisy and highly dynamic channels. Extensive simulation results demonstrate that the GLNN can achieve 4.15% higher spectral efficiency than that of typical iterative algorithms, and reduce the time consumption to only 1.61% that of conventional methods.
- [416] arXiv:2405.07703 (replaced) [pdf, ps, html, other]
-
Title: OpenLLM-Ro -- Technical Report on Open-source Romanian LLMsMihai Masala, Denis C. Ilie-Ablachim, Dragos Corlatescu, Miruna Zavelca, Marius Leordeanu, Horia Velicu, Marius Popescu, Mihai Dascalu, Traian RebedeaSubjects: Computation and Language (cs.CL)
In recent years, Large Language Models (LLMs) have achieved almost human-like performance on various tasks. While some LLMs have been trained on multilingual data, most of the training data is in English. Hence, their performance in English greatly exceeds their performance in other languages. This document presents our approach to training and evaluating the first foundational and chat LLM specialized for Romanian.
- [417] arXiv:2405.07836 (replaced) [pdf, ps, other]
-
Title: Forecasting with Hyper-TreesComments: Forecasting, Gradient Boosting, Hyper-Networks, LightGBM, Parameter Non-Stationarity, Time Series, XGBoostSubjects: Machine Learning (cs.LG); Methodology (stat.ME)
This paper introduces the concept of Hyper-Trees and offers a new direction in applying tree-based models to time series data. Unlike conventional applications of decision trees that forecast time series directly, Hyper-Trees are designed to learn the parameters of a target time series model. Our framework leverages the gradient-based nature of boosted trees, which allows us to extend the concept of Hyper-Networks to Hyper-Trees and to induce a time-series inductive bias to tree models. By relating the parameters of a target time series model to features, Hyper-Trees address the issue of parameter non-stationarity and enable tree-based forecasts to extend beyond their training range. With our research, we aim to explore the effectiveness of Hyper-Trees across various forecasting scenarios and to extend the application of gradient boosted decision trees outside their conventional use in time series modeling.
- [418] arXiv:2405.07919 (replaced) [pdf, ps, html, other]
-
Title: Exploring the Low-Pass Filtering Behavior in Image Super-ResolutionComments: Accepted by ICML 2024Subjects: Computer Vision and Pattern Recognition (cs.CV)
Deep neural networks for image super-resolution (ISR) have shown significant advantages over traditional approaches like the interpolation. However, they are often criticized as 'black boxes' compared to traditional approaches with solid mathematical foundations. In this paper, we attempt to interpret the behavior of deep neural networks in ISR using theories from the field of signal processing. First, we report an intriguing phenomenon, referred to as `the sinc phenomenon.' It occurs when an impulse input is fed to a neural network. Then, building on this observation, we propose a method named Hybrid Response Analysis (HyRA) to analyze the behavior of neural networks in ISR tasks. Specifically, HyRA decomposes a neural network into a parallel connection of a linear system and a non-linear system and demonstrates that the linear system functions as a low-pass filter while the non-linear system injects high-frequency information. Finally, to quantify the injected high-frequency information, we introduce a metric for image-to-image tasks called Frequency Spectrum Distribution Similarity (FSDS). FSDS reflects the distribution similarity of different frequency components and can capture nuances that traditional metrics may overlook. Code, videos and raw experimental results for this paper can be found in: this https URL.
- [419] arXiv:2405.08299 (replaced) [pdf, ps, html, other]
-
Title: Differentially Private Federated Learning: A Systematic ReviewJie Fu, Yuan Hong, Xinpeng Ling, Leixia Wang, Xun Ran, Zhiyu Sun, Wendy Hui Wang, Zhili Chen, Yang CaoComments: 37pagesSubjects: Cryptography and Security (cs.CR); Machine Learning (cs.LG)
In recent years, privacy and security concerns in machine learning have promoted trusted federated learning to the forefront of research. Differential privacy has emerged as the de facto standard for privacy protection in federated learning due to its rigorous mathematical foundation and provable guarantee. Despite extensive research on algorithms that incorporate differential privacy within federated learning, there remains an evident deficiency in systematic reviews that categorize and synthesize these studies.
Our work presents a systematic overview of the differentially private federated learning. Existing taxonomies have not adequately considered objects and level of privacy protection provided by various differential privacy models in federated learning. To rectify this gap, we propose a new taxonomy of differentially private federated learning based on definition and guarantee of various differential privacy models and federated scenarios. Our classification allows for a clear delineation of the protected objects across various differential privacy models and their respective neighborhood levels within federated learning environments. Furthermore, we explore the applications of differential privacy in federated learning scenarios. Our work provide valuable insights into privacy-preserving federated learning and suggest practical directions for future research. - [420] arXiv:2405.08337 (replaced) [pdf, ps, other]
-
Title: Perivascular space Identification Nnunet for Generalised Usage (PINGU)Benjamin Sinclair, Lucy Vivash, Jasmine Moses, Miranda Lynch, William Pham, Karina Dorfman, Cassandra Marotta, Shaun Koh, Jacob Bunyamin, Ella Rowsthorn, Alex Jarema, Himashi Peiris, Zhaolin Chen, Sandy R Shultz, David K Wright, Dexiao Kong, Sharon L. Naismith, Terence J. OBrien, Meng LawSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI)
Perivascular spaces(PVSs) form a central component of the brainś waste clearance system, the glymphatic system. These structures are visible on MRI images, and their morphology is associated with aging and neurological disease. Manual quantification of PVS is time consuming and subjective. Numerous deep learning methods for PVS segmentation have been developed, however the majority have been developed and evaluated on homogenous datasets and high resolution scans, perhaps limiting their applicability for the wide range of image qualities acquired in clinic and research. In this work we train a nnUNet, a top-performing biomedical image segmentation algorithm, on a heterogenous training sample of manually segmented MRI images of a range of different qualities and resolutions from 6 different datasets. These are compared to publicly available deep learning methods for 3D segmentation of PVS. The resulting model, PINGU (Perivascular space Identification Nnunet for Generalised Usage), achieved voxel and cluster level dice scores of 0.50(SD=0.15), 0.63(0.17) in the white matter(WM), and 0.54(0.11), 0.66(0.17) in the basal ganglia(BG). Performance on data from unseen sites was substantially lower for both PINGU(0.20-0.38(WM, voxel), 0.29-0.58(WM, cluster), 0.22-0.36(BG, voxel), 0.46-0.60(BG, cluster)) and the publicly available algorithms(0.18-0.30(WM, voxel), 0.29-0.38(WM cluster), 0.10-0.20(BG, voxel), 0.15-0.37(BG, cluster)), but PINGU strongly outperformed the publicly available algorithms, particularly in the BG. Finally, training PINGU on manual segmentations from a single site with homogenous scan properties gave marginally lower performances on internal cross-validation, but in some cases gave higher performance on external validation. PINGU stands out as broad-use PVS segmentation tool, with particular strength in the BG, an area of PVS related to vascular disease and pathology.
- [421] arXiv:2405.08366 (replaced) [pdf, ps, html, other]
-
Title: Towards Principled Evaluations of Sparse Autoencoders for Interpretability and ControlSubjects: Machine Learning (cs.LG)
Disentangling model activations into meaningful features is a central problem in interpretability. However, the lack of ground-truth for these features in realistic scenarios makes the validation of recent approaches, such as sparse dictionary learning, elusive. To overcome this, we propose a framework to evaluate feature dictionaries in the context of specific tasks, by comparing them against \emph{supervised} feature dictionaries. First, we demonstrate that supervised dictionaries achieve excellent approximation, control and interpretability of model computations on the task. Second, we use the supervised dictionaries to develop and contextualize evaluations of unsupervised dictionaries along the same three axes.
We apply this framework to the indirect object identification task (IOI) using GPT-2 Small, with sparse autoencoders (SAEs) trained on either the IOI or OpenWebText datasets. We find that these SAEs capture interpretable features for the IOI task, but they are not as successful as supervised features in controlling the model. Finally, we observe two qualitative phenomena in SAE training: feature occlusion (where a causally relevant concept is robustly overshadowed by even slightly higher-magnitude ones in the learned features), and feature over-splitting (where binary features split into many smaller features without clear interpretation). We hope that our framework will be a useful step towards more objective and grounded evaluations of sparse dictionary learning methods. - [422] arXiv:2405.08403 (replaced) [pdf, ps, html, other]
-
Title: TFWT: Tabular Feature Weighting with TransformerComments: Accepted by IJCAI 2024Subjects: Machine Learning (cs.LG)
In this paper, we propose a novel feature weighting method to address the limitation of existing feature processing methods for tabular data. Typically the existing methods assume equal importance across all samples and features in one dataset. This simplified processing methods overlook the unique contributions of each feature, and thus may miss important feature information. As a result, it leads to suboptimal performance in complex datasets with rich features. To address this problem, we introduce Tabular Feature Weighting with Transformer, a novel feature weighting approach for tabular data. Our method adopts Transformer to capture complex feature dependencies and contextually assign appropriate weights to discrete and continuous features. Besides, we employ a reinforcement learning strategy to further fine-tune the weighting process. Our extensive experimental results across various real-world datasets and diverse downstream tasks show the effectiveness of TFWT and highlight the potential for enhancing feature weighting in tabular data analysis.
- [423] arXiv:2405.08988 (replaced) [pdf, ps, html, other]
-
Title: Techniques for Showing the Decidability of the Boundedness Problem of Language AcceptorsComments: 23 pages,2 figuresSubjects: Formal Languages and Automata Theory (cs.FL)
There are many types of automata and grammar models that have been studied in the literature, and for these models, it is common to determine whether certain problems are decidable. One problem that has been difficult to answer throughout the history of automata and formal language theory is to decide whether a given system $M$ accepts a bounded language (whether there exist words $w_1, \ldots,w_k$ such that $L(M) \subseteq w_1 \cdots w_k$?). Decidability of this problem has gone unanswered for the majority of automata/grammar models in the literature. Boundedness was only known to be decidable for regular and context-free languages until recently when it was shown to also be decidable for finite-automata and pushdown automata augmented with reversal-bounded counters, and for vector addition systems with states.
In this paper, we develop new techniques to show that the boundedness problem is decidable for larger classes of one-way nondeterministic automata and grammar models, by reducing the problem to the decidability of boundedness for simpler classes of automata. One technique involves characterizing the models in terms of multi-tape automata. We give new characterizations of finite-turn Turing machines, finite-turn Turing machines augmented with various storage structures (like a pushdown, multiple reversal-bounded counters, partially-blind counters, etc.), and simple matrix grammars. The characterizations are then used to show that the boundedness problem for these models is decidable. Another technique uses the concept of the store language of an automaton. This is used to show that the boundedness problem is decidable for pushdown automata that can "flip" their pushdown a bounded number of times, and boundedness remains decidable even if we augment this device with additional stores. - [424] arXiv:2405.09062 (replaced) [pdf, ps, html, other]
-
Title: Naturalistic Music Decoding from EEG Data via Latent Diffusion ModelsEmilian Postolache, Natalia Polouliakh, Hiroaki Kitano, Akima Connelly, Emanuele Rodolà, Taketo AkamaSubjects: Sound (cs.SD); Machine Learning (cs.LG); Audio and Speech Processing (eess.AS)
In this article, we explore the potential of using latent diffusion models, a family of powerful generative models, for the task of reconstructing naturalistic music from electroencephalogram (EEG) recordings. Unlike simpler music with limited timbres, such as MIDI-generated tunes or monophonic pieces, the focus here is on intricate music featuring a diverse array of instruments, voices, and effects, rich in harmonics and timbre. This study represents an initial foray into achieving general music reconstruction of high-quality using non-invasive EEG data, employing an end-to-end training approach directly on raw data without the need for manual pre-processing and channel selection. We train our models on the public NMED-T dataset and perform quantitative evaluation proposing neural embedding-based metrics. We additionally perform song classification based on the generated tracks. Our work contributes to the ongoing research in neural decoding and brain-computer interfaces, offering insights into the feasibility of using EEG data for complex auditory information reconstruction.
- [425] arXiv:2405.09312 (replaced) [pdf, ps, html, other]
-
Title: Agnostic Active Learning of Single Index Models with Linear Sample ComplexitySubjects: Machine Learning (cs.LG)
We study active learning methods for single index models of the form $F({\mathbf x}) = f(\langle {\mathbf w}, {\mathbf x}\rangle)$, where $f:\mathbb{R} \to \mathbb{R}$ and ${\mathbf x,\mathbf w} \in \mathbb{R}^d$. In addition to their theoretical interest as simple examples of non-linear neural networks, single index models have received significant recent attention due to applications in scientific machine learning like surrogate modeling for partial differential equations (PDEs). Such applications require sample-efficient active learning methods that are robust to adversarial noise. I.e., that work even in the challenging agnostic learning setting.
We provide two main results on agnostic active learning of single index models. First, when $f$ is known and Lipschitz, we show that $\tilde{O}(d)$ samples collected via {statistical leverage score sampling} are sufficient to learn a near-optimal single index model. Leverage score sampling is simple to implement, efficient, and already widely used for actively learning linear models. Our result requires no assumptions on the data distribution, is optimal up to log factors, and improves quadratically on a recent ${O}(d^{2})$ bound of \cite{gajjar2023active}. Second, we show that $\tilde{O}(d)$ samples suffice even in the more difficult setting when $f$ is \emph{unknown}. Our results leverage tools from high dimensional probability, including Dudley's inequality and dual Sudakov minoration, as well as a novel, distribution-aware discretization of the class of Lipschitz functions. - [426] arXiv:2405.09531 (replaced) [pdf, ps, html, other]
-
Title: Ticket-based multi-strand method for increased efficiency in proof-of-work based blockchainsComments: 6 pagesSubjects: Distributed, Parallel, and Cluster Computing (cs.DC)
This paper outlines a method aiming to increase the efficiency of proof-of-work based blockchains using a ticket-based approach. To avoid the limitation of serially adding one block at a time to a blockchain, multiple semi-independent chains are used such that several valid blocks can be added in parallel, when they are added to separate chains. Blocks are added to different chains, the chain index being determined by a "ticket" that the miner must produce before mining a new block. This allows increasing the transaction rate by several orders of magnitude while the system is still fully decentralized and permissionless, and maintaining security in the sense that a successful attack would require the attacker to control a significant portion of the whole network.
- [427] arXiv:2405.09591 (replaced) [pdf, ps, html, other]
-
Title: A Comprehensive Survey on Data AugmentationZaitian Wang, Pengfei Wang, Kunpeng Liu, Pengyang Wang, Yanjie Fu, Chang-Tien Lu, Charu C. Aggarwal, Jian Pei, Yuanchun ZhouSubjects: Machine Learning (cs.LG); Artificial Intelligence (cs.AI)
Data augmentation is a series of techniques that generate high-quality artificial data by manipulating existing data samples. By leveraging data augmentation techniques, AI models can achieve significantly improved applicability in tasks involving scarce or imbalanced datasets, thereby substantially enhancing AI models' generalization capabilities. Existing literature surveys only focus on a certain type of specific modality data, and categorize these methods from modality-specific and operation-centric perspectives, which lacks a consistent summary of data augmentation methods across multiple modalities and limits the comprehension of how existing data samples serve the data augmentation process. To bridge this gap, we propose a more enlightening taxonomy that encompasses data augmentation techniques for different common data modalities. Specifically, from a data-centric perspective, this survey proposes a modality-independent taxonomy by investigating how to take advantage of the intrinsic relationship between data samples, including single-wise, pair-wise, and population-wise sample data augmentation methods. Additionally, we categorize data augmentation methods across five data modalities through a unified inductive approach.
- [428] arXiv:2405.09713 (replaced) [pdf, ps, html, other]
-
Title: SOK-Bench: A Situated Video Reasoning Benchmark with Aligned Open-World KnowledgeComments: CVPRSubjects: Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Learning commonsense reasoning from visual contexts and scenes in real-world is a crucial step toward advanced artificial intelligence. However, existing video reasoning benchmarks are still inadequate since they were mainly designed for factual or situated reasoning and rarely involve broader knowledge in the real world. Our work aims to delve deeper into reasoning evaluations, specifically within dynamic, open-world, and structured context knowledge. We propose a new benchmark (SOK-Bench), consisting of 44K questions and 10K situations with instance-level annotations depicted in the videos. The reasoning process is required to understand and apply situated knowledge and general knowledge for problem-solving. To create such a dataset, we propose an automatic and scalable generation method to generate question-answer pairs, knowledge graphs, and rationales by instructing the combinations of LLMs and MLLMs. Concretely, we first extract observable situated entities, relations, and processes from videos for situated knowledge and then extend to open-world knowledge beyond the visible content. The task generation is facilitated through multiple dialogues as iterations and subsequently corrected and refined by our designed self-promptings and demonstrations. With a corpus of both explicit situated facts and implicit commonsense, we generate associated question-answer pairs and reasoning processes, finally followed by manual reviews for quality assurance. We evaluated recent mainstream large vision-language models on the benchmark and found several insightful conclusions. For more information, please refer to our benchmark at this http URL.
- [429] arXiv:2405.09733 (replaced) [pdf, ps, html, other]
-
Title: SCI 3.0: A Web-based Schema Curation Interface for Graphical Event RepresentationsSubjects: Computation and Language (cs.CL)
To understand the complexity of global events, one must navigate a web of interwoven sub-events, identifying those most impactful elements within the larger, abstract macro-event framework at play. This concept can be extended to the field of natural language processing (NLP) through the creation of structured event schemas which can serve as representations of these abstract events. Central to our approach is the Schema Curation Interface 3.0 (SCI 3.0), a web application that facilitates real-time editing of event schema properties within a generated graph e.g., adding, removing, or editing sub-events, entities, and relations directly through an interface.
- [430] arXiv:2405.09735 (replaced) [pdf, ps, html, other]
-
Title: An Analysis of Sentential Neighbors in Implicit Discourse Relation PredictionSubjects: Computation and Language (cs.CL)
Discourse relation classification is an especially difficult task without explicit context markers (Prasad et al., 2008). Current approaches to implicit relation prediction solely rely on two neighboring sentences being targeted, ignoring the broader context of their surrounding environments (Atwell et al., 2021). In this research, we propose three new methods in which to incorporate context in the task of sentence relation prediction: (1) Direct Neighbors (DNs), (2) Expanded Window Neighbors (EWNs), and (3) Part-Smart Random Neighbors (PSRNs). Our findings indicate that the inclusion of context beyond one discourse unit is harmful in the task of discourse relation classification.
- [431] arXiv:2405.09814 (replaced) [pdf, ps, html, other]
-
Title: Semantic Gesticulator: Semantics-Aware Co-Speech Gesture SynthesisComments: SIGGRAPH 2024 (Journal Track); Project page: this https URLSubjects: Graphics (cs.GR); Computer Vision and Pattern Recognition (cs.CV); Sound (cs.SD); Audio and Speech Processing (eess.AS)
In this work, we present Semantic Gesticulator, a novel framework designed to synthesize realistic gestures accompanying speech with strong semantic correspondence. Semantically meaningful gestures are crucial for effective non-verbal communication, but such gestures often fall within the long tail of the distribution of natural human motion. The sparsity of these movements makes it challenging for deep learning-based systems, trained on moderately sized datasets, to capture the relationship between the movements and the corresponding speech semantics. To address this challenge, we develop a generative retrieval framework based on a large language model. This framework efficiently retrieves suitable semantic gesture candidates from a motion library in response to the input speech. To construct this motion library, we summarize a comprehensive list of commonly used semantic gestures based on findings in linguistics, and we collect a high-quality motion dataset encompassing both body and hand movements. We also design a novel GPT-based model with strong generalization capabilities to audio, capable of generating high-quality gestures that match the rhythm of speech. Furthermore, we propose a semantic alignment mechanism to efficiently align the retrieved semantic gestures with the GPT's output, ensuring the naturalness of the final animation. Our system demonstrates robustness in generating gestures that are rhythmically coherent and semantically explicit, as evidenced by a comprehensive collection of examples. User studies confirm the quality and human-likeness of our results, and show that our system outperforms state-of-the-art systems in terms of semantic appropriateness by a clear margin.
- [432] arXiv:2405.09817 (replaced) [pdf, ps, html, other]
-
Title: Active Learning with Fully Bayesian Neural Networks for Discontinuous and Nonstationary DataComments: Fixed PGM in Figure 2 and update captionSubjects: Machine Learning (cs.LG); Data Analysis, Statistics and Probability (physics.data-an)
Active learning optimizes the exploration of large parameter spaces by strategically selecting which experiments or simulations to conduct, thus reducing resource consumption and potentially accelerating scientific discovery. A key component of this approach is a probabilistic surrogate model, typically a Gaussian Process (GP), which approximates an unknown functional relationship between control parameters and a target property. However, conventional GPs often struggle when applied to systems with discontinuities and non-stationarities, prompting the exploration of alternative models. This limitation becomes particularly relevant in physical science problems, which are often characterized by abrupt transitions between different system states and rapid changes in physical property behavior. Fully Bayesian Neural Networks (FBNNs) serve as a promising substitute, treating all neural network weights probabilistically and leveraging advanced Markov Chain Monte Carlo techniques for direct sampling from the posterior distribution. This approach enables FBNNs to provide reliable predictive distributions, crucial for making informed decisions under uncertainty in the active learning setting. Although traditionally considered too computationally expensive for 'big data' applications, many physical sciences problems involve small amounts of data in relatively low-dimensional parameter spaces. Here, we assess the suitability and performance of FBNNs with the No-U-Turn Sampler for active learning tasks in the 'small data' regime, highlighting their potential to enhance predictive accuracy and reliability on test functions relevant to problems in physical sciences.
- [433] arXiv:2405.09883 (replaced) [pdf, ps, html, other]
-
Title: RoScenes: A Large-scale Multi-view 3D Dataset for Roadside PerceptionXiaosu Zhu, Hualian Sheng, Sijia Cai, Bing Deng, Shaopeng Yang, Qiao Liang, Ken Chen, Lianli Gao, Jingkuan Song, Jieping YeComments: Technical report. 32 pages, 21 figures, 13 tables. this https URLSubjects: Computer Vision and Pattern Recognition (cs.CV)
We introduce RoScenes, the largest multi-view roadside perception dataset, which aims to shed light on the development of vision-centric Bird's Eye View (BEV) approaches for more challenging traffic scenes. The highlights of RoScenes include significantly large perception area, full scene coverage and crowded traffic. More specifically, our dataset achieves surprising 21.13M 3D annotations within 64,000 $m^2$. To relieve the expensive costs of roadside 3D labeling, we present a novel BEV-to-3D joint annotation pipeline to efficiently collect such a large volume of data. After that, we organize a comprehensive study for current BEV methods on RoScenes in terms of effectiveness and efficiency. Tested methods suffer from the vast perception area and variation of sensor layout across scenes, resulting in performance levels falling below expectations. To this end, we propose RoBEV that incorporates feature-guided position embedding for effective 2D-3D feature assignment. With its help, our method outperforms state-of-the-art by a large margin without extra computational overhead on validation set. Our dataset and devkit will be made available at this https URL.
- [434] arXiv:2405.09955 (replaced) [pdf, ps, html, other]
-
Title: Dual-band feature selection for maturity classification of specialty crops by hyperspectral imagingComments: Preprint: Paper submitted to the special issue of "Computers and Electronics in Agriculture"Subjects: Computer Vision and Pattern Recognition (cs.CV)
The maturity classification of specialty crops such as strawberries and tomatoes is an essential agricultural downstream activity for selective harvesting and quality control (QC) at production and packaging sites. Recent advancements in Deep Learning (DL) have produced encouraging results in color images for maturity classification applications. However, hyperspectral imaging (HSI) outperforms methods based on color vision. Multivariate analysis methods and Convolutional Neural Networks (CNN) deliver promising results; however, a large amount of input data and the associated preprocessing requirements cause hindrances in practical application. Conventionally, the reflectance intensity in a given electromagnetic spectrum is employed in estimating fruit maturity. We present a feature extraction method to empirically demonstrate that the peak reflectance in subbands such as 500-670 nm (pigment band) and the wavelength of the peak position, and contrarily, the trough reflectance and its corresponding wavelength within 671-790 nm (chlorophyll band) are convenient to compute yet distinctive features for the maturity classification. The proposed feature selection method is beneficial because preprocessing, such as dimensionality reduction, is avoided before every prediction. The feature set is designed to capture these traits. The best SOTA methods, among 3D-CNN, 1D-CNN, and SVM, achieve at most 90.0 % accuracy for strawberries and 92.0 % for tomatoes on our dataset. Results show that the proposed method outperforms the SOTA as it yields an accuracy above 98.0 % in strawberry and 96.0 % in tomato classification. A comparative analysis of the time efficiency of these methods is also conducted, which shows the proposed method performs prediction at 13 Frames Per Second (FPS) compared to the maximum 1.16 FPS attained by the full-spectrum SVM classifier.
- [435] arXiv:2405.10029 (replaced) [pdf, ps, html, other]
-
Title: AsCL: An Asymmetry-sensitive Contrastive Learning Method for Image-Text Retrieval with Cross-Modal FusionComments: This work has been strong-accepted as the oral conference paper by IEEE International Conference on Multimedia & Expo (ICME) 2024Subjects: Multimedia (cs.MM)
The image-text retrieval task aims to retrieve relevant information from a given image or text. The main challenge is to unify multimodal representation and distinguish fine-grained differences across modalities, thereby finding similar contents and filtering irrelevant contents. However, existing methods mainly focus on unified semantic representation and concept alignment for multi-modalities, while the fine-grained differences across modalities have rarely been studied before, making it difficult to solve the information asymmetry problem. In this paper, we propose a novel asymmetry-sensitive contrastive learning method. By generating corresponding positive and negative samples for different asymmetry types, our method can simultaneously ensure fine-grained semantic differentiation and unified semantic representation between multi-modalities. Additionally, a hierarchical cross-modal fusion method is proposed, which integrates global and local-level features through a multimodal attention mechanism to achieve concept alignment. Extensive experiments performed on MSCOCO and Flickr30K, demonstrate the effectiveness and superiority of our proposed method.
- [436] arXiv:2405.10128 (replaced) [pdf, ps, html, other]
-
Title: Red Teaming Language Models for Contradictory DialoguesComments: 18 pages, 5 figuresSubjects: Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Most language models currently available are prone to self-contradiction during dialogues. To mitigate this issue, this study explores a novel contradictory dialogue processing task that aims to detect and modify contradictory statements in a conversation. This task is inspired by research on context faithfulness and dialogue comprehension, which have demonstrated that the detection and understanding of contradictions often necessitate detailed explanations. We develop a dataset comprising contradictory dialogues, in which one side of the conversation contradicts itself. Each dialogue is accompanied by an explanatory label that highlights the location and details of the contradiction. With this dataset, we present a Red Teaming framework for contradictory dialogue processing. The framework detects and attempts to explain the dialogue, then modifies the existing contradictory content using the explanation. Our experiments demonstrate that the framework improves the ability to detect contradictory dialogues and provides valid explanations. Additionally, it showcases distinct capabilities for modifying such dialogues. Our study highlights the importance of the logical inconsistency problem in conversational AI.
- [437] arXiv:2405.10267 (replaced) [pdf, ps, html, other]
-
Title: Sharpness-Aware Minimization in Genetic ProgrammingComments: Submitted to the Genetic Programming Theory and Practice workshop 2024Subjects: Neural and Evolutionary Computing (cs.NE); Machine Learning (cs.LG)
Sharpness-Aware Minimization (SAM) was recently introduced as a regularization procedure for training deep neural networks. It simultaneously minimizes the fitness (or loss) function and the so-called fitness sharpness. The latter serves as a measure of the nonlinear behavior of a solution and does so by finding solutions that lie in neighborhoods having uniformly similar loss values across all fitness cases. In this contribution, we adapt SAM for tree Genetic Programming (TGP) by exploring the semantic neighborhoods of solutions using two simple approaches. By capitalizing upon perturbing input and output of program trees, sharpness can be estimated and used as a second optimization criterion during the evolution. To better understand the impact of this variant of SAM on TGP, we collect numerous indicators of the evolutionary process, including generalization ability, complexity, diversity, and a recently proposed genotype-phenotype mapping to study the amount of redundancy in trees. The experimental results demonstrate that using any of the two proposed SAM adaptations in TGP allows (i) a significant reduction of tree sizes in the population and (ii) a decrease in redundancy of the trees. When assessed on real-world benchmarks, the generalization ability of the elite solutions does not deteriorate.
- [438] arXiv:2405.10292 (replaced) [pdf, ps, html, other]
-
Title: Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement LearningYuexiang Zhai, Hao Bai, Zipeng Lin, Jiayi Pan, Shengbang Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey LevineSubjects: Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Large vision-language models (VLMs) fine-tuned on specialized visual instruction-following data have exhibited impressive language reasoning capabilities across various scenarios. However, this fine-tuning paradigm may not be able to efficiently learn optimal decision-making agents in multi-step goal-directed tasks from interactive environments. To address this challenge, we propose an algorithmic framework that fine-tunes VLMs with reinforcement learning (RL). Specifically, our framework provides a task description and then prompts the VLM to generate chain-of-thought (CoT) reasoning, enabling the VLM to efficiently explore intermediate reasoning steps that lead to the final text-based action. Next, the open-ended text output is parsed into an executable action to interact with the environment to obtain goal-directed task rewards. Finally, our framework uses these task rewards to fine-tune the entire VLM with RL. Empirically, we demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks, enabling 7b models to outperform commercial models such as GPT4-V or Gemini. Furthermore, we find that CoT reasoning is a crucial component for performance improvement, as removing the CoT reasoning results in a significant decrease in the overall performance of our method.
- [439] arXiv:2405.10320 (replaced) [pdf, ps, html, other]
-
Title: Toon3D: Seeing Cartoons from a New PerspectiveComments: Please see our project page: https://toon3d.studioSubjects: Computer Vision and Pattern Recognition (cs.CV)
In this work, we recover the underlying 3D structure of non-geometrically consistent scenes. We focus our analysis on hand-drawn images from cartoons and anime. Many cartoons are created by artists without a 3D rendering engine, which means that any new image of a scene is hand-drawn. The hand-drawn images are usually faithful representations of the world, but only in a qualitative sense, since it is difficult for humans to draw multiple perspectives of an object or scene 3D consistently. Nevertheless, people can easily perceive 3D scenes from inconsistent inputs! In this work, we correct for 2D drawing inconsistencies to recover a plausible 3D structure such that the newly warped drawings are consistent with each other. Our pipeline consists of a user-friendly annotation tool, camera pose estimation, and image deformation to recover a dense structure. Our method warps images to obey a perspective camera model, enabling our aligned results to be plugged into novel-view synthesis reconstruction methods to experience cartoons from viewpoints never drawn before. Our project page is https://toon3d.studio .
- [440] arXiv:1609.00004 (replaced) [pdf, ps, html, other]
-
Title: On the initial value of PageRankComments: 11 pages, 15 figuresSubjects: Physics and Society (physics.soc-ph); Social and Information Networks (cs.SI)
Google employs PageRank to rank web pages, determining the order in which search results are presented to users based on their queries. PageRank is primarily utilized for directed networks, although there are instances where it is also applied to undirected networks. In this paper, we have applied PageRank to undirected networks, showing that a vertex's PageRank relies on its initial value, often referred to as an intrinsic, non-network contribution. We have analytically proved that when the initial value of vertices is either proportional to their degrees or set to zero, the PageRank values of the vertices become directly proportional to their degrees. Simulated and empirical data are employed to bolster our research findings. Additionally, we have investigated the impact of initial values on PageRank localization.
- [441] arXiv:2107.10764 (replaced) [pdf, ps, html, other]
-
Title: Nonlinear transformation of complex amplitudes via quantum singular value transformationSubjects: Quantum Physics (quant-ph); Data Structures and Algorithms (cs.DS)
Due to the linearity of quantum operations, it is not straightforward to implement nonlinear transformations on a quantum computer, making some practical tasks like a neural network hard to be achieved. In this work, we define a task called nonlinear transformation of complex amplitudes and provide an algorithm to achieve this task. Specifically, we construct a block-encoding of complex amplitudes from a state preparation unitary. This allows us to transform the complex amplitudes by using quantum singular value transformation. We evaluate the required overhead in terms of input dimension and precision, which reveals that the algorithm depends on the roughly square root of input dimension and achieves an exponential speedup on precision compared with previous work. We also discuss its possible applications to quantum machine learning, where complex amplitudes encoding classical or quantum data are processed by the proposed method. This paper provides a promising way to introduce highly complex nonlinearity of the quantum states, which is essentially missing in quantum mechanics.
- [442] arXiv:2302.13680 (replaced) [pdf, ps, html, other]
-
Title: A multigrid solver for PDE-constrained optimization with uncertain inputsComments: 37, 3 figuresSubjects: Optimization and Control (math.OC); Numerical Analysis (math.NA)
In this manuscript, we present a collective multigrid algorithm to solve efficiently the large saddle-point systems of equations that typically arise in PDE-constrained optimization under uncertainty, and develop a novel convergence analysis of collective smoothers and collective two-level methods. The multigrid algorithm is based on a collective smoother that at each iteration sweeps over the nodes of the computational mesh, and solves a reduced saddle-point system whose size is proportional to the number $N$ of samples used to discretized the probability space. We show that this reduced system can be solved with optimal $O(N)$ complexity.
The multigrid method is tested both as a stationary method and as a preconditioner for GMRES on three problems: a linear-quadratic problem, possibly with a local or a boundary control, for which the multigrid method is used to solve directly the linear optimality system; a nonsmooth problem with box constraints and $L^1$-norm penalization on the control, in which the multigrid scheme is used as an inner solver within a semismooth Newton iteration; a risk-averse problem with the smoothed CVaR risk measure where the multigrid method is called within a preconditioned Newton iteration. In all cases, the multigrid algorithm exhibits excellent performances and robustness with respect to the parameters of interest. - [443] arXiv:2303.00351 (replaced) [pdf, ps, html, other]
-
Title: Leveraging SO(3)-steerable convolutions for pose-robust semantic segmentation in 3D medical dataComments: Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) this https URLJournal-ref: Machine.Learning.for.Biomedical.Imaging. 2 (2024)Subjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Convolutional neural networks (CNNs) allow for parameter sharing and translational equivariance by using convolutional kernels in their linear layers. By restricting these kernels to be SO(3)-steerable, CNNs can further improve parameter sharing. These rotationally-equivariant convolutional layers have several advantages over standard convolutional layers, including increased robustness to unseen poses, smaller network size, and improved sample efficiency. Despite this, most segmentation networks used in medical image analysis continue to rely on standard convolutional kernels. In this paper, we present a new family of segmentation networks that use equivariant voxel convolutions based on spherical harmonics. These networks are robust to data poses not seen during training, and do not require rotation-based data augmentation during training. In addition, we demonstrate improved segmentation performance in MRI brain tumor and healthy brain structure segmentation tasks, with enhanced robustness to reduced amounts of training data and improved parameter efficiency. Code to reproduce our results, and to implement the equivariant segmentation networks for other tasks is available at this http URL
- [444] arXiv:2303.07158 (replaced) [pdf, ps, html, other]
-
Title: Uniform Pessimistic Risk and its Optimal PortfolioSubjects: Portfolio Management (q-fin.PM); Machine Learning (cs.LG); Computation (stat.CO); Machine Learning (stat.ML)
The optimal allocation of assets has been widely discussed with the theoretical analysis of risk measures, and pessimism is one of the most attractive approaches beyond the conventional optimal portfolio model. The $\alpha$-risk plays a crucial role in deriving a broad class of pessimistic optimal portfolios. However, estimating an optimal portfolio assessed by a pessimistic risk is still challenging due to the absence of a computationally tractable model. In this study, we propose an integral of $\alpha$-risk called the \textit{uniform pessimistic risk} and the computational algorithm to obtain an optimal portfolio based on the risk. Further, we investigate the theoretical properties of the proposed risk in view of three different approaches: multiple quantile regression, the proper scoring rule, and distributionally robust optimization. Real data analysis of three stock datasets (S\&P500, CSI500, KOSPI200) demonstrates the usefulness of the proposed risk and portfolio model.
- [445] arXiv:2303.17593 (replaced) [pdf, ps, html, other]
-
Title: Anatomically aware dual-hop learning for pulmonary embolism detection in CT pulmonary angiogramsFlorin Condrea, Saikiran Rapaka, Lucian Itu, Puneet Sharma, Jonathan Sperl, A Mohamed Ali, Marius LeordeanuComments: Accepted to Computers in Biology and Medicine journalSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
Pulmonary Embolisms (PE) represent a leading cause of cardiovascular death. While medical imaging, through computed tomographic pulmonary angiography (CTPA), represents the gold standard for PE diagnosis, it is still susceptible to misdiagnosis or significant diagnosis delays, which may be fatal for critical cases. Despite the recently demonstrated power of deep learning to bring a significant boost in performance in a wide range of medical imaging tasks, there are still very few published researches on automatic pulmonary embolism detection. Herein we introduce a deep learning based approach, which efficiently combines computer vision and deep neural networks for pulmonary embolism detection in CTPA. Our method features novel improvements along three orthogonal axes: 1) automatic detection of anatomical structures; 2) anatomical aware pretraining, and 3) a dual-hop deep neural net for PE detection. We obtain state-of-the-art results on the publicly available multicenter large-scale RSNA dataset.
- [446] arXiv:2304.00598 (replaced) [pdf, ps, html, other]
-
Title: Stochastic Reachability of Uncontrolled Systems via Probability Measures: Approximation via Deep Neural NetworksComments: 8 pages, 4 figures, 1 table, Submitted to the Conference on Decision and Control 2024Subjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
This paper poses a theoretical characterization of the stochastic reachability problem in terms of probability measures, capturing the probability measure of the state of the system that satisfies the reachability specification for all probabilities over a finite horizon. We achieve this by constructing the level sets of the probability measure for all probability values and, since our approach is only for autonomous systems, we can determine the level sets via forward simulations of the system from a point in the state space at some time step in the finite horizon to estimate the reach probability. We devise a training procedure which exploits this forward simulation and employ it to design a deep neural network (DNN) to predict the reach probability provided the current state and time step. We validate the effectiveness of our approach through three examples.
- [447] arXiv:2305.12100 (replaced) [pdf, ps, html, other]
-
Title: How Spurious Features Are Memorized: Precise Analysis for Random and NTK FeaturesComments: Revision after ICML2024 acceptance. Motivation of the paper changed from Privacy to Spurious Features. arXiv admin note: text overlap with arXiv:2302.01629Subjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
Deep learning models are known to overfit and memorize spurious features in the training dataset. While numerous empirical studies have aimed at understanding this phenomenon, a rigorous theoretical framework to quantify it is still missing. In this paper, we consider spurious features that are uncorrelated with the learning task, and we provide a precise characterization of how they are memorized via two separate terms: (i) the stability of the model with respect to individual training samples, and (ii) the feature alignment between the spurious feature and the full sample. While the first term is well established in learning theory and it is connected to the generalization error in classical work, the second one is, to the best of our knowledge, novel. Our key technical result gives a precise characterization of the feature alignment for the two prototypical settings of random features (RF) and neural tangent kernel (NTK) regression. We prove that the memorization of spurious features weakens as the generalization capability increases and, through the analysis of the feature alignment, we unveil the role of the model and of its activation function. Numerical experiments show the predictive power of our theory on standard datasets (MNIST, CIFAR-10).
- [448] arXiv:2307.06155 (replaced) [pdf, ps, html, other]
-
Title: Relative Fractional Independence Number and Its ApplicationsSubjects: Combinatorics (math.CO); Information Theory (cs.IT)
We define the relative fractional independence number of a graph $G$ with respect to another graph $H$, as $$\alpha^*(G|H)=\max_{W}\frac{\alpha(G\boxtimes W)}{\alpha(H\boxtimes W)},$$ where the maximum is taken over all graphs $W$, $G\boxtimes W$ is the strong product of $G$ and $W$, and $\alpha$ denotes the independence number. We give a non-trivial linear program to compute $\alpha^*(G|H)$ and discuss some of its properties. We show that $\alpha^*(G|H)\geq \frac{X(G)}{X(H)} \geq \frac{1}{\alpha^*(H|G)},$ where $X(G)$ can be the independence number, the zero-error Shannon capacity, the fractional independence number, the Lovász number, or the Schrijver's or Szegedy's variants of the Lovász number of a graph $G$. This inequality is the first explicit non-trivial upper bound on the ratio of the invariants of two arbitrary graphs, as mentioned earlier, which can also be used to obtain upper or lower bounds for these invariants. As explicit applications, we present new upper bounds for the ratio of the zero-error Shannon capacity of two Cayley graphs and compute new lower bounds on the Shannon capacity of certain Johnson graphs (yielding the exact value of their Haemers number). Moreover, we show that $\alpha^*(G|H)$ can be used to present a stronger version of the well-known No-Homomorphism Lemma.
- [449] arXiv:2309.00520 (replaced) [pdf, ps, html, other]
-
Title: Robust Online Learning over NetworksSubjects: Optimization and Control (math.OC); Machine Learning (cs.LG); Multiagent Systems (cs.MA); Systems and Control (eess.SY)
The recent deployment of multi-agent networks has enabled the distributed solution of learning problems, where agents cooperate to train a global model without sharing their local, private data. This work specifically targets some prevalent challenges inherent to distributed learning: (i) online training, i.e., the local data change over time; (ii) asynchronous agent computations; (iii) unreliable and limited communications; and (iv) inexact local computations. To tackle these challenges, we apply the Distributed Operator Theoretical (DOT) version of the Alternating Direction Method of Multipliers (ADMM), which we call "DOT-ADMM". We prove that if the DOT-ADMM operator is metric subregular, then it converges with a linear rate for a large class of (not necessarily strongly) convex learning problems toward a bounded neighborhood of the optimal time-varying solution, and characterize how such neighborhood depends on (i)-(iv). We first derive an easy-to-verify condition for ensuring the metric subregularity of an operator, followed by tutorial examples on linear and logistic regression problems. We corroborate the theoretical analysis with numerical simulations comparing DOT-ADMM with other state-of-the-art algorithms, showing that only the proposed algorithm exhibits robustness to (i)-(iv).
- [450] arXiv:2310.14691 (replaced) [pdf, ps, html, other]
-
Title: Identifiability of total effects from abstractions of time series causal graphsComments: Accepted to the 40th Conference on Uncertainty in Artificial Intelligence (UAI) 2024, Barcelona, SpainSubjects: Statistics Theory (math.ST); Artificial Intelligence (cs.AI)
We study the problem of identifiability of the total effect of an intervention from observational time series in the situation, common in practice, where one only has access to abstractions of the true causal graph. We consider here two abstractions: the extended summary causal graph, which conflates all lagged causal relations but distinguishes between lagged and instantaneous relations, and the summary causal graph which does not give any indication about the lag between causal relations. We show that the total effect is always identifiable in extended summary causal graphs and provide sufficient conditions for identifiability in summary causal graphs. We furthermore provide adjustment sets allowing to estimate the total effect whenever it is identifiable.
- [451] arXiv:2310.17582 (replaced) [pdf, ps, html, other]
-
Title: Convergence of flow-based generative models via proximal gradient descent in Wasserstein spaceSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC); Statistics Theory (math.ST)
Flow-based generative models enjoy certain advantages in computing the data generation and the likelihood, and have recently shown competitive empirical performance. Compared to the accumulating theoretical studies on related score-based diffusion models, analysis of flow-based models, which are deterministic in both forward (data-to-noise) and reverse (noise-to-data) directions, remain sparse. In this paper, we provide a theoretical guarantee of generating data distribution by a progressive flow model, the so-called JKO flow model, which implements the Jordan-Kinderleherer-Otto (JKO) scheme in a normalizing flow network. Leveraging the exponential convergence of the proximal gradient descent (GD) in Wasserstein space, we prove the Kullback-Leibler (KL) guarantee of data generation by a JKO flow model to be $O(\varepsilon^2)$ when using $N \lesssim \log (1/\varepsilon)$ many JKO steps ($N$ Residual Blocks in the flow) where $\varepsilon $ is the error in the per-step first-order condition. The assumption on data density is merely a finite second moment, and the theory extends to data distributions without density and when there are inversion errors in the reverse process where we obtain KL-$W_2$ mixed error guarantees. The non-asymptotic convergence rate of the JKO-type $W_2$-proximal GD is proved for a general class of convex objective functionals that includes the KL divergence as a special case, which can be of independent interest. The analysis framework can extend to other first-order Wasserstein optimization schemes applied to flow-based generative models.
- [452] arXiv:2311.05794 (replaced) [pdf, ps, html, other]
-
Title: An Experimental Design for Anytime-Valid Causal Inference on Multi-Armed BanditsSubjects: Methodology (stat.ME); Machine Learning (cs.LG)
In multi-armed bandit (MAB) experiments, it is often advantageous to continuously produce inference on the average treatment effect (ATE) between arms as new data arrive and determine a data-driven stopping time for the experiment. We develop the Mixture Adaptive Design (MAD), a new experimental design for multi-armed bandit experiments that produces powerful and anytime-valid inference on the ATE for \emph{any} bandit algorithm of the experimenter's choice, even those without probabilistic treatment assignment. Intuitively, the MAD "mixes" any bandit algorithm of the experimenter's choice with a Bernoulli design through a tuning parameter $\delta_t$, where $\delta_t$ is a deterministic sequence that decreases the priority placed on the Bernoulli design as the sample size grows. We prove that for $\delta_t = \omega\left(t^{-1/4}\right)$, the MAD generates anytime-valid asymptotic confidence sequences that are guaranteed to shrink around the true ATE. Hence, the experimenter is guaranteed to detect a true non-zero treatment effect in finite time. Additionally, we prove that the regret of the MAD approaches that of its underlying bandit algorithm over time, and hence, incurs a relatively small loss in regret in return for powerful inferential guarantees. Finally, we conduct an extensive simulation study exhibiting that the MAD achieves finite-sample anytime validity and high power without significant losses in finite-sample reward.
- [453] arXiv:2311.09375 (replaced) [pdf, ps, html, other]
-
Title: Distributed Constrained Combinatorial Optimization leveraging Hypergraph Neural NetworksSubjects: Optimization and Control (math.OC); Distributed, Parallel, and Cluster Computing (cs.DC)
Scalable addressing of high dimensional constrained combinatorial optimization problems is a challenge that arises in several science and engineering disciplines. Recent work introduced novel application of graph neural networks for solving quadratic-cost combinatorial optimization problems. However, effective utilization of models such as graph neural networks to address general problems with higher order constraints is an unresolved challenge. This paper presents a framework, HypOp, which advances the state of the art for solving combinatorial optimization problems in several aspects: (i) it generalizes the prior results to higher order constrained problems with arbitrary cost functions by leveraging hypergraph neural networks; (ii) enables scalability to larger problems by introducing a new distributed and parallel training architecture; (iii) demonstrates generalizability across different problem formulations by transferring knowledge within the same hypergraph; (iv) substantially boosts the solution accuracy compared with the prior art by suggesting a fine-tuning step using simulated annealing; (v) shows a remarkable progress on numerous benchmark examples, including hypergraph MaxCut, satisfiability, and resource allocation problems, with notable run time improvements using a combination of fine-tuning and distributed training techniques. We showcase the application of HypOp in scientific discovery by solving a hypergraph MaxCut problem on NDC drug-substance hypergraph. Through extensive experimentation on various optimization problems, HypOp demonstrates superiority over existing unsupervised learning-based solvers and generic optimization methods.
- [454] arXiv:2311.17778 (replaced) [pdf, ps, html, other]
-
Title: Unified Binary and Multiclass Margin-Based ClassificationComments: Accepted for publication in Journal of Machine Learning ResearchSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG)
The notion of margin loss has been central to the development and analysis of algorithms for binary classification. To date, however, there remains no consensus as to the analogue of the margin loss for multiclass classification. In this work, we show that a broad range of multiclass loss functions, including many popular ones, can be expressed in the relative margin form, a generalization of the margin form of binary losses. The relative margin form is broadly useful for understanding and analyzing multiclass losses as shown by our prior work (Wang and Scott, 2020, 2021). To further demonstrate the utility of this way of expressing multiclass losses, we use it to extend the seminal result of Bartlett et al. (2006) on classification-calibration of binary margin losses to multiclass. We then analyze the class of Fenchel-Young losses, and expand the set of these losses that are known to be classification-calibrated.
- [455] arXiv:2312.02691 (replaced) [pdf, ps, html, other]
-
Title: Edge coloring of products of signed graphsSubjects: Combinatorics (math.CO); Discrete Mathematics (cs.DM); Data Structures and Algorithms (cs.DS)
In 2020, Behr defined the problem of edge coloring of signed graphs and showed that every signed graph $(G, \sigma)$ can be colored using exactly $\Delta(G)$ or $\Delta(G) + 1$ colors, where $\Delta(G)$ is the maximum degree in graph $G$.
In this paper, we focus on products of signed graphs. We recall the definitions of the Cartesian, tensor, strong, and corona products of signed graphs and prove results for them. In particular, we show that $(1)$ the Cartesian product of $\Delta$-edge-colorable signed graphs is $\Delta$-edge-colorable, $(2)$ the tensor product of a $\Delta$-edge-colorable signed graph and a signed tree requires only $\Delta$ colors and $(3)$ the corona product of almost any two signed graphs is $\Delta$-edge-colorable. We also prove some results related to the coloring of products of signed paths and cycles. - [456] arXiv:2312.07792 (replaced) [pdf, ps, html, other]
-
Title: Differentially private projection-depth-based mediansComments: 44 pages, 1 figureSubjects: Statistics Theory (math.ST); Cryptography and Security (cs.CR); Machine Learning (cs.LG); Methodology (stat.ME)
We develop $(\epsilon,\delta)$-differentially private projection-depth-based medians using the propose-test-release (PTR) and exponential mechanisms. Under general conditions on the input parameters and the population measure, (e.g. we do not assume any moment bounds), we quantify the probability the test in PTR fails, as well as the cost of privacy via finite sample deviation bounds. We then present a new definition of the finite sample breakdown point which applies to a mechanism, and present a lower bound on the finite sample breakdown point of the projection-depth-based median. We demonstrate our main results on the canonical projection-depth-based median, as well as on projection-depth-based medians derived from trimmed estimators. In the Gaussian setting, we show that the resulting deviation bound matches the known lower bound for private Gaussian mean estimation. In the Cauchy setting, we show that the "outlier error amplification" effect resulting from the heavy tails outweighs the cost of privacy. This result is then verified via numerical simulations. Additionally, we present results on general PTR mechanisms and a uniform concentration result on the projected spacings of order statistics, which may be of general interest.
- [457] arXiv:2312.12113 (replaced) [pdf, ps, html, other]
-
Title: Variational Mode Decomposition-Based Nonstationary Coherent Structure Analysis for Spatiotemporal DataJournal-ref: Aerospace Science and Technology, Volume 149, June 2024, 109162Subjects: Fluid Dynamics (physics.flu-dyn); Machine Learning (cs.LG); Dynamical Systems (math.DS)
The conventional modal analysis techniques face difficulties in handling nonstationary phenomena, such as transient, nonperiodic, or intermittent phenomena. This paper presents a variational mode decomposition--based nonstationary coherent structure (VMD-NCS) analysis that enables the extraction and analysis of coherent structures in the case of nonstationary phenomena from high-dimensional spatiotemporal data. The VMD-NCS analysis decomposes the input spatiotemporal data into intrinsic coherent structures (ICSs) that represent nonstationary spatiotemporal patterns and exhibit coherence in both spatial and temporal directions. Unlike many conventional modal analysis techniques, the proposed method accounts for the temporal changes in the spatial distribution with time. Tthe VMD-NCS analysis was validated based on the transient growth phenomena in the flow around a cylinder. It was confirmed that the temporal changes in the spatial distribution, depicting the transient growth of vortex shedding where fluctuations arising in the far-wake region gradually approach the near-wake region, were represented as a single ICS. Furthermore, in the analysis of the quasi-periodic flow field around a pitching airfoil, the temporal changes in the spatial distribution and the amplitude of vortex shedding behind the airfoil, influenced by the pitching motion of the airfoil, were captured as a single ICS. The impact of two parameters that control the number of ICSs ($K$) and the penalty factor related to the temporal coherence ($\alpha$), was investigated. The results revealed that $K$ has a significant impact on the VMD-NCS analysis results. In the case of a relatively high $K$, the VMD-NCS analysis tends to extract more periodic spatiotemporal patterns resembling the results of dynamic mode decomposition. In the case of a small $K$, it tends to extract more nonstationary spatiotemporal patterns.
- [458] arXiv:2401.08667 (replaced) [pdf, ps, html, other]
-
Title: Data-Driven Physics-Informed Neural Networks: A Digital Twin PerspectiveSubjects: Fluid Dynamics (physics.flu-dyn); Computational Engineering, Finance, and Science (cs.CE); Machine Learning (cs.LG)
This study explores the potential of physics-informed neural networks (PINNs) for the realization of digital twins (DT) from various perspectives. First, various adaptive sampling approaches for collocation points are investigated to verify their effectiveness in the mesh-free framework of PINNs, which allows automated construction of virtual representation without manual mesh generation. Then, the overall performance of the data-driven PINNs (DD-PINNs) framework is examined, which can utilize the acquired datasets in DT scenarios. Its scalability to more general physics is validated within parametric Navier-Stokes equations, where PINNs do not need to be retrained as the Reynolds number varies. In addition, since datasets can be often collected from different fidelity/sparsity in practice, multi-fidelity DD-PINNs are also proposed and evaluated. They show remarkable prediction performance even in the extrapolation tasks, with $42\sim62\%$ improvement over the single-fidelity approach. Finally, the uncertainty quantification performance of multi-fidelity DD-PINNs is investigated by the ensemble method to verify their potential in DT, where an accurate measure of predictive uncertainty is critical. The DD-PINN frameworks explored in this study are found to be more suitable for DT scenarios than traditional PINNs from the above perspectives, bringing engineers one step closer to seamless DT realization.
- [459] arXiv:2401.09091 (replaced) [pdf, ps, other]
-
Title: Noise-Tolerant Quantum Algorithm for Ground State Energy EstimationSubjects: Quantum Physics (quant-ph); Numerical Analysis (math.NA); Computational Physics (physics.comp-ph)
One of the most promising applications of quantum computers is to simulate physical systems, leveraging their inherent quantum behavior to achieve an advantage over classical computation. In this work, we present a noise-tolerant Hamiltonian simulation algorithm for ground-state energy estimation. Our method surmounts stochastic sampling limitations to estimate expectation values. It is based on an adaptive sequence of fuzzy bisection searches to estimate the ground state energy digit by digit, with a trade-off between increasing the simulation time and decreasing the absolute error rate. It builds upon the Quantum Eigenvalue Transformation of Unitary Matrices (QETU) algorithm, and it delivers good approximations in simulations with local, two-qubit gate depolarizing probability up to 1e-3, specifically for Hamiltonians that anti-commute with a Pauli string. To demonstrate the key results in this work, we ran simulations with different system Hamiltonians, system sizes, and time evolution encoding methods on classical computers using Qiskit. We compare the performance with other existing methods and show that we can consistently achieve two to three orders of magnitude improvement in the absolute error rate.
- [460] arXiv:2401.17921 (replaced) [pdf, ps, other]
-
Title: Optimizing T and CNOT Gates in Quantum Ripple-Carry Adders and ComparatorsSubjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
The state of the art of quantum circuits using the ripple-carry strategy for the addition and comparison of two n-bit numbers is presented, as well as optimizations in the Clifford+T gate set, both in terms of CNOT-depth and T-depth, or CNOT-count and T-count. In particular, considering the adders presented by Cuccaro et al. and Takahashi et al., circuits with a T-depth of 4n and a CNOT-depth of 8n are obtained, while without optimization of the original circuits, a T-depth of 6n is expected. In addition, a new adder with optimized CNOT-count and T-count is introduced. Note that we have focused here on quantum ripple-carry adders using at most one ancilla, without any approximation of the 3-qubit gates involved (Toffoli, Peres and TR) or any strategy involving a measurement.
- [461] arXiv:2402.02969 (replaced) [pdf, ps, html, other]
-
Title: Towards Understanding the Word Sensitivity of Attention Layers: A Study via Random FeaturesComments: Revision after ICML2024 reviewsSubjects: Machine Learning (stat.ML); Computation and Language (cs.CL); Machine Learning (cs.LG)
Understanding the reasons behind the exceptional success of transformers requires a better analysis of why attention layers are suitable for NLP tasks. In particular, such tasks require predictive models to capture contextual meaning which often depends on one or few words, even if the sentence is long. Our work studies this key property, dubbed word sensitivity (WS), in the prototypical setting of random features. We show that attention layers enjoy high WS, namely, there exists a vector in the space of embeddings that largely perturbs the random attention features map. The argument critically exploits the role of the softmax in the attention layer, highlighting its benefit compared to other activations (e.g., ReLU). In contrast, the WS of standard random features is of order $1/\sqrt{n}$, $n$ being the number of words in the textual sample, and thus it decays with the length of the context. We then translate these results on the word sensitivity into generalization bounds: due to their low WS, random features provably cannot learn to distinguish between two sentences that differ only in a single word; in contrast, due to their high WS, random attention features have higher generalization capabilities. We validate our theoretical results with experimental evidence over the BERT-Base word embeddings of the imdb review dataset.
- [462] arXiv:2402.06952 (replaced) [pdf, ps, html, other]
-
Title: Estimating the Effect of Crosstalk Error on Circuit Fidelity Using Noisy Intermediate-Scale Quantum DevicesSubjects: Quantum Physics (quant-ph); Artificial Intelligence (cs.AI)
Current advancements in technology have focused the attention of the quantum computing community toward exploring the potential of near-term devices whose computing power surpasses that of classical computers in practical applications. An unresolved central question revolves around whether the inherent noise in these devices can be overcome or whether any potential quantum advantage would be limited. There is no doubt that crosstalk is one of the main sources of noise in noisy intermediate-scale quantum (NISQ) systems, and it poses a fundamental challenge to hardware designs. Crosstalk between parallel instructions can corrupt quantum states and cause incorrect program execution. In this study, we present a necessary analysis of the crosstalk error effect on NISQ devices. Our approach is extremely straightforward and practical to estimate the crosstalk error of various multi-qubit devices. In particular, we combine the randomized benchmarking (RB) and simultaneous randomized benchmarking (SRB) protocol to estimate the crosstalk error from the correlation controlled-NOT (CNOT) gate. We demonstrate this protocol experimentally on 5-, 7-, \& 16-qubit devices. Our results demonstrate the crosstalk error model of three different IBM quantum devices over the experimental week and compare the error variation against the machine, number of qubits, quantum volume, processor, and topology. We then confirm the improvement in the circuit fidelity on different benchmarks by up to 3.06x via inserting an instruction barrier, as compared with an IBM quantum noisy device which offers near-optimal crosstalk mitigation in practice. Finally, we discuss the current system limitation, its tradeoff on fidelity and depth, noise beyond the NISQ system, and mitigation opportunities to ensure that the quantum operation can perform its quantum magic undisturbed.
- [463] arXiv:2402.09108 (replaced) [pdf, ps, html, other]
-
Title: Novel Long Distance Free Space Quantum Secure Direct Communication for Web 3.0 NetworksComments: 17 pages, 6 figuresSubjects: Quantum Physics (quant-ph); Cryptography and Security (cs.CR)
With the advent of Web 3.0, the swift advancement of technology confronts an imminent threat from quantum computing. Security protocols safeguarding the integrity of Web 2.0 and Web 3.0 are growing more susceptible to both quantum attacks and sophisticated classical threats. The article introduces our novel long-distance free-space quantum secure direct communication (LF QSDC) as a method to safeguard against security breaches in both quantum and classical contexts. Differing from techniques like quantum key distribution (QKD), LF QSDC surpasses constraints by facilitating encrypted data transmission sans key exchanges, thus diminishing the inherent weaknesses of key-based systems. The distinctiveness of this attribute, coupled with its quantum mechanics base, protects against quantum computer assaults and advanced non-quantum dangers, harmonizing seamlessly with the untrustworthy tenets of the Web 3.0 age. The focus of our study is the technical design and incorporation of LF QSDC into web 3.0 network infrastructures, highlighting its efficacy for extended-range communication. LF QSDC is based on the memory DL04 protocol and enhanced with our novel Quantum-Aware Low-Density Parity Check (LDPC), Pointing, Acquisition, and Tracking (PAT) technologies, and Atmospheric Quantum Correction Algorithm (AQCA). Utilizing this method not only bolsters the security of worldwide Web 3.0 networks but also guarantees their endurance in a time when quantum and sophisticated classical threats exist simultaneously. Consequently, LF QSDC stands out as a robust security solution, well-suited for Web 3.0 systems amidst the constantly evolving digital environment.
- [464] arXiv:2403.13357 (replaced) [pdf, ps, html, other]
-
Title: A Machine Learning Approach for Multiscale Modeling of Biological TissuesComments: Data and code can be found at this https URLSubjects: Biological Physics (physics.bio-ph); Soft Condensed Matter (cond-mat.soft); Computational Engineering, Finance, and Science (cs.CE)
We develop a new neural network architecture that strictly enforces constitutive constraints such as polyconvexity, frame-indifference, and the symmetry of the stress and material stiffness. Additionally, we show that the accuracy of the stress and material stiffness predictions is significantly improved for this neural network by using a Sobolev minimization strategy that includes derivative terms. Using our neural network, we model the constitutive behavior of fibrous-type discrete network material. With Sobolev minimization, we obtain a normalized mean square error of 0.15% for the strain energy density, 0.815% averaged across the components of the stress, and 5.4% averaged across the components of the stiffness tensor. This machine-learned constitutive model was deployed in a finite element simulation of a facet capsular ligament. The displacement fields and stress-strain curves were compared to a multiscale simulation that required running on a GPU-based supercomputer. The new approach maintained upward of 85% accuracy in stress up to 70% strain while reducing the computation cost by orders of magnitude.
- [465] arXiv:2403.13680 (replaced) [pdf, ps, html, other]
-
Title: Step-Calibrated Diffusion for Biomedical Optical Image RestorationYiwei Lyu, Sung Jik Cha, Cheng Jiang, Asadur Chowdury, Xinhai Hou, Edward Harake, Akhil Kondepudi, Christian Freudiger, Honglak Lee, Todd C. HollonSubjects: Image and Video Processing (eess.IV); Computer Vision and Pattern Recognition (cs.CV)
High-quality, high-resolution medical imaging is essential for clinical care. Raman-based biomedical optical imaging uses non-ionizing infrared radiation to evaluate human tissues in real time and is used for early cancer detection, brain tumor diagnosis, and intraoperative tissue analysis. Unfortunately, optical imaging is vulnerable to image degradation due to laser scattering and absorption, which can result in diagnostic errors and misguided treatment. Restoration of optical images is a challenging computer vision task because the sources of image degradation are multi-factorial, stochastic, and tissue-dependent, preventing a straightforward method to obtain paired low-quality/high-quality data. Here, we present Restorative Step-Calibrated Diffusion (RSCD), an unpaired image restoration method that views the image restoration problem as completing the finishing steps of a diffusion-based image generation task. RSCD uses a step calibrator model to dynamically determine the severity of image degradation and the number of steps required to complete the reverse diffusion process for image restoration. RSCD outperforms other widely used unpaired image restoration methods on both image quality and perceptual evaluation metrics for restoring optical images. Medical imaging experts consistently prefer images restored using RSCD in blinded comparison experiments and report minimal to no hallucinations. Finally, we show that RSCD improves performance on downstream clinical imaging tasks, including automated brain tumor diagnosis and deep tissue imaging. Our code is available at this https URL.
- [466] arXiv:2403.15521 (replaced) [pdf, ps, html, other]
-
Title: Exploring new territory: Calibration-free decoding for c-VEP BCIComments: 6 pages, 2 figures, 9th Graz Brain-Computer Interface Conference 2024Subjects: Signal Processing (eess.SP); Machine Learning (cs.LG)
This study explores two zero-training methods aimed at enhancing the usability of brain-computer interfaces (BCIs) by eliminating the need for a calibration session. We introduce a novel method rooted in the event-related potential (ERP) domain, unsupervised mean maximization (UMM), to the fast code-modulated visual evoked potential (c-VEP) stimulus protocol. We compare UMM to the state-of-the-art c-VEP zero-training method that uses canonical correlation analysis (CCA). The comparison includes instantaneous classification and classification with cumulative learning from previously classified trials for both CCA and UMM. Our study shows the effectiveness of both methods in navigating the complexities of a c-VEP dataset, highlighting their differences and distinct strengths. This research not only provides insights into the practical implementation of calibration-free BCI methods but also paves the way for further exploration and refinement. Ultimately, the fusion of CCA and UMM holds promise for enhancing the accessibility and usability of BCI systems across various application domains and a multitude of stimulus protocols.
- [467] arXiv:2403.15523 (replaced) [pdf, ps, html, other]
-
Title: Towards auditory attention decoding with noise-tagging: A pilot studyComments: 6 pages, 2 figures, 9th Graz Brain-Computer Interface Conference 2024Subjects: Neurons and Cognition (q-bio.NC); Artificial Intelligence (cs.AI); Machine Learning (cs.LG); Sound (cs.SD); Audio and Speech Processing (eess.AS)
Auditory attention decoding (AAD) aims to extract from brain activity the attended speaker amidst candidate speakers, offering promising applications for neuro-steered hearing devices and brain-computer interfacing. This pilot study makes a first step towards AAD using the noise-tagging stimulus protocol, which evokes reliable code-modulated evoked potentials, but is minimally explored in the auditory modality. Participants were sequentially presented with two Dutch speech stimuli that were amplitude-modulated with a unique binary pseudo-random noise-code, effectively tagging these with additional decodable information. We compared the decoding of unmodulated audio against audio modulated with various modulation depths, and a conventional AAD method against a standard method to decode noise-codes. Our pilot study revealed higher performances for the conventional method with 70 to 100 percent modulation depths compared to unmodulated audio. The noise-code decoder did not further improve these results. These fundamental insights highlight the potential of integrating noise-codes in speech to enhance auditory speaker detection when multiple speakers are presented simultaneously.
- [468] arXiv:2403.16165 (replaced) [pdf, ps, html, other]
-
Title: Input-to-State Stability of Newton Methods for Generalized Equations in Nonlinear OptimizationComments: Submitted to 2024 Conference on Decision and ControlSubjects: Optimization and Control (math.OC); Systems and Control (eess.SY)
We show that Newton methods for generalized equations are input-to-state stable with respect to disturbances such as due to inexact computations. We then use this result to obtain convergence and robustness of a multistep Newton-type method for multivariate generalized equations. We demonstrate the usefulness of the results with other applications to nonlinear optimization. In particular, we provide a new proof for (robust) local convergence of the augmented Lagrangian method.
- [469] arXiv:2404.00082 (replaced) [pdf, ps, html, other]
-
Title: Data-Driven Room Acoustic Modeling Via Differentiable Feedback Delay Networks With Learnable Delay LinesComments: The article has been submitted to EURASIP Journal on Audio, Speech, and Music Processing on Jan 02, 2024 and is currently under reviewSubjects: Audio and Speech Processing (eess.AS); Machine Learning (cs.LG); Sound (cs.SD)
Over the past few decades, extensive research has been devoted to the design of artificial reverberation algorithms aimed at emulating the room acoustics of physical environments. Despite significant advancements, automatic parameter tuning of delay-network models remains an open challenge. We introduce a novel method for finding the parameters of a Feedback Delay Network (FDN) such that its output renders target attributes of a measured room impulse response. The proposed approach involves the implementation of a differentiable FDN with trainable delay lines, which, for the first time, allows us to simultaneously learn each and every delay-network parameter via backpropagation. The iterative optimization process seeks to minimize a perceptually-motivated time-domain loss function incorporating differentiable terms accounting for energy decay and echo density. Through experimental validation, we show that the proposed method yields time-invariant frequency-independent FDNs capable of closely matching the desired acoustical characteristics, and outperforms existing methods based on genetic algorithms and analytical FDN design.
- [470] arXiv:2404.10260 (replaced) [pdf, ps, html, other]
-
Title: HelixFold-Multimer: Elevating Protein Complex Structure Prediction to New HeightsSubjects: Biomolecules (q-bio.BM); Artificial Intelligence (cs.AI)
While monomer protein structure prediction tools boast impressive accuracy, the prediction of protein complex structures remains a daunting challenge in the field. This challenge is particularly pronounced in scenarios involving complexes with protein chains from different species, such as antigen-antibody interactions, where accuracy often falls short. Limited by the accuracy of complex prediction, tasks based on precise protein-protein interaction analysis also face obstacles. In this report, we highlight the ongoing advancements of our protein complex structure prediction model, HelixFold-Multimer, underscoring its enhanced performance. HelixFold-Multimer provides precise predictions for diverse protein complex structures, especially in therapeutic protein interactions. Notably, HelixFold-Multimer achieves remarkable success in antigen-antibody and peptide-protein structure prediction, greatly surpassing AlphaFold 3. HelixFold-Multimer is now available for public use on the PaddleHelix platform, offering both a general version and an antigen-antibody version. Researchers can conveniently access and utilize this service for their development needs.
- [471] arXiv:2404.18369 (replaced) [pdf, ps, html, other]
-
Title: A SAT Scalpel for Lattice Surgery: Representation and Synthesis of Subroutines for Surface-Code Fault-Tolerant Quantum ComputingComments: To appear in 2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)Subjects: Quantum Physics (quant-ph); Emerging Technologies (cs.ET)
Quantum error correction is necessary for large-scale quantum computing. A promising quantum error correcting code is the surface code. For this code, fault-tolerant quantum computing (FTQC) can be performed via lattice surgery, i.e., splitting and merging patches of code. Given the frequent use of certain lattice-surgery subroutines (LaS), it becomes crucial to optimize their design in order to minimize the overall spacetime volume of FTQC. In this study, we define the variables to represent LaS and the constraints on these variables. Leveraging this formulation, we develop a synthesizer for LaS, LaSsynth, that encodes a LaS construction problem into a SAT instance, subsequently querying SAT solvers for a solution. Starting from a baseline design, we can gradually invoke the solver with shrinking spacetime volume to derive more compact designs. Due to our foundational formulation and the use of SAT solvers, LaSsynth can exhaustively explore the design space, yielding optimal designs in volume. For example, it achieves 8% and 18% volume reduction respectively over two states-of-the-art human designs for the 15-to-1 T-factory, a bottleneck in FTQC.
- [472] arXiv:2404.18807 (replaced) [pdf, ps, other]
-
Title: The Landscape of Unfolding with Machine LearningNathan Huetsch, Javier Mariño Villadamigo, Alexander Shmakov, Sascha Diefenbacher, Vinicius Mikuni, Theo Heimel, Michael Fenton, Kevin Greif, Benjamin Nachman, Daniel Whiteson, Anja Butter, Tilman PlehnSubjects: High Energy Physics - Phenomenology (hep-ph); Machine Learning (cs.LG); High Energy Physics - Experiment (hep-ex)
Recent innovations from machine learning allow for data unfolding, without binning and including correlations across many dimensions. We describe a set of known, upgraded, and new methods for ML-based unfolding. The performance of these approaches are evaluated on the same two datasets. We find that all techniques are capable of accurately reproducing the particle-level spectra across complex observables. Given that these approaches are conceptually diverse, they offer an exciting toolkit for a new class of measurements that can probe the Standard Model with an unprecedented level of detail and may enable sensitivity to new phenomena.
- [473] arXiv:2405.08253 (replaced) [pdf, ps, html, other]
-
Title: Thompson Sampling for Infinite-Horizon Discounted Decision ProcessesSubjects: Machine Learning (stat.ML); Machine Learning (cs.LG); Optimization and Control (math.OC)
We model a Markov decision process, parametrized by an unknown parameter, and study the asymptotic behavior of a sampling-based algorithm, called Thompson sampling. The standard definition of regret is not always suitable to evaluate a policy, especially when the underlying chain structure is general. We show that the standard (expected) regret can grow (super-)linearly and fails to capture the notion of learning in realistic settings with non-trivial state evolution. By decomposing the standard (expected) regret, we develop a new metric, called the expected residual regret, which forgets the immutable consequences of past actions. Instead, it measures regret against the optimal reward moving forward from the current period. We show that the expected residual regret of the Thompson sampling algorithm is upper bounded by a term which converges exponentially fast to 0. We present conditions under which the posterior sampling error of Thompson sampling converges to 0 almost surely. We then introduce the probabilistic version of the expected residual regret and present conditions under which it converges to 0 almost surely. Thus, we provide a viable concept of learning for sampling algorithms which will serve useful in broader settings than had been considered previously.
- [474] arXiv:2405.09551 (replaced) [pdf, ps, html, other]
-
Title: Towards Bi-Hemispheric Emotion Mapping through EEG: A Dual-Stream Neural Network ApproachDavid Freire-Obregón, Daniel Hernández-Sosa, Oliverio J. Santana, Javier Lorenzo-Navarro, Modesto Castrillón-SantanaComments: Second place award at the Brain Responses to Emotional Avatars Challenge held by the 18th IEEE International Conference on Automatic Face and Gesture Recognition(FG2024)Subjects: Signal Processing (eess.SP); Human-Computer Interaction (cs.HC)
Emotion classification through EEG signals plays a significant role in psychology, neuroscience, and human-computer interaction. This paper addresses the challenge of mapping human emotions using EEG data in the Mapping Human Emotions through EEG Signals FG24 competition. Subjects mimic the facial expressions of an avatar, displaying fear, joy, anger, sadness, disgust, and surprise in a VR setting. EEG data is captured using a multi-channel sensor system to discern brain activity patterns. We propose a novel two-stream neural network employing a Bi-Hemispheric approach for emotion inference, surpassing baseline methods and enhancing emotion recognition accuracy. Additionally, we conduct a temporal analysis revealing that specific signal intervals at the beginning and end of the emotion stimulus sequence contribute significantly to improve accuracy. Leveraging insights gained from this temporal analysis, our approach offers enhanced performance in capturing subtle variations in the states of emotions.
- [475] arXiv:2405.09554 (replaced) [pdf, ps, html, other]
-
Title: Underdetermined DOA Estimation of Off-Grid Sources Based on the Generalized Double Pareto PriorSubjects: Signal Processing (eess.SP); Information Theory (cs.IT)
In this letter, we investigate a new generalized double Pareto based on off-grid sparse Bayesian learning (GDPOGSBL) approach to improve the performance of direction of arrival (DOA) estimation in underdetermined scenarios. The method aims to enhance the sparsity of source signal by utilizing the generalized double Pareto (GDP) prior. Firstly, we employ a first-order linear Taylor expansion to model the real array manifold matrix, and Bayesian inference is utilized to calculate the off-grid error, which mitigates the grid dictionary mismatch problem in underdetermined scenarios. Secondly, an innovative grid refinement method is introduced, treating grid points as iterative parameters to minimize the modeling error between the source and grid points. The numerical simulation results verify the superiority of the proposed strategy, especially when dealing with a coarse grid and few snapshots.