Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation

Thought Anchors: Which LLM Reasoning Steps Matter?

Quantifying Fairness in LLMs Beyond Tokens: A Semantic and Statistical Perspective

OmniGen2: Exploration to Advanced Multimodal Generation

Confucius3-Math: A Lightweight High-Performance Reasoning LLM for Chinese K-12 Mathematics Learning

Morse: Dual-Sampling for Lossless Acceleration of Diffusion Models

Quantum-Classical Hybrid Quantized Neural Network

Non-equilibrium Annealed Adjoint Sampler

PP-DocBee2: Improved Baselines with Efficient Data for Multimodal Document Understanding

Mapping the Evolution of Research Contributions using KnoVo

MS-TVNet:A Long-Term Time Series Prediction Method Based on Multi-Scale Dynamic Convolution

No Free Lunch: Rethinking Internal Feedback for LLM Reasoning

TabArena: A Living Benchmark for Machine Learning on Tabular Data

VRAIL: Vectorized Reward-based Attribution for Interpretable Learning

CLAIM: Clinically-Guided LGE Augmentation for Realistic and Diverse Myocardial Scar Synthesis and Segmentation

Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments

IKDiffuser: A Generative Inverse Kinematics Solver for Multi-arm Robots via Diffusion Model

Fine-Grained Perturbation Guidance via Attention Head Selection

Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

C3S3: Complementary Competition and Contrastive Selection for Semi-Supervised Medical Image Segmentation

SMAR: Soft Modality-Aware Routing Strategy for MoE-based Multimodal Large Language Models Preserving Language Capabilities

Recycling the Web: A Method to Enhance Pre-training Data Quality and Quantity for Language Models

Supervised Quantum Machine Learning: A Future Outlook from Qubits to Enterprise Applications

Aurora: Are Android Malware Classifiers Reliable and Stable under Distribution Shift?

CogniBench: A Legal-inspired Framework and Dataset for Assessing Cognitive Faithfulness of Large Language Models

AIDRIN 2.0: A Framework to Assess Data Readiness for AI

TSPulse: Dual Space Tiny Pre-Trained Models for Rapid Time-Series Analysis

Teacher Motion Priors: Enhancing Robot Locomotion over Challenging Terrain

WoundAmbit: Bridging State-of-the-Art Semantic Segmentation and Real-World Wound Care

Computation Mechanism Behind LLM Position Generalization

Training Plug-n-Play Knowledge Modules with Deep Context Distillation

MaizeField3D: A Curated 3D Point Cloud and Procedural Model Dataset of Field-Grown Maize from a Diversity Panel

From $\mathcal{O}(n^{2})$ to $\mathcal{O}(n)$ Parameters: Quantum Self-Attention in Vision Transformers for Biomedical Image Classification

Robust Multimodal Learning for Ophthalmic Disease Grading via Disentangled Representation

FGS-SLAM: Fourier-based Gaussian Splatting for Real-time SLAM with Sparse and Dense Map Fusion

Rewarding Graph Reasoning Process makes LLMs more Generalized Reasoners

Protein Structure Tokenization: Benchmarking and New Recipe

Chemical knowledge-informed framework for privacy-aware retrosynthesis learning

Balancing Truthfulness and Informativeness with Uncertainty-Aware Instruction Fine-Tuning

Diffusion Models Through a Global Lens: Are They Culturally Inclusive?

WyckoffDiff -- A Generative Diffusion Model for Crystal Symmetry

Solving Linear-Gaussian Bayesian Inverse Problems with Decoupled Diffusion Sequential Monte Carlo

Adversarial Reasoning at Jailbreaking Time

AgentBreeder: Mitigating the AI Safety Impact of Multi-Agent Scaffolds via Self-Improvement

Rethinking Early Stopping: Refine, Then Calibrate

Unlocking In-Context Learning for Natural Datasets Beyond Language Modeling

Towards Backdoor Stealthiness in Model Parameter Space

Distributed satellite information networks: Architecture, enabling technologies, and trends

Integrating Various Software Artifacts for Better LLM-based Bug Localization and Program Repair

Proximal Control of UAVs with Federated Learning for Human-Robot Collaborative Domains

Understanding World or Predicting Future? A Comprehensive Survey of World Models

USP-Gaussian: Unifying Spike-based Image Reconstruction, Pose Correction and Gaussian Splatting

Separating Tongue from Thought: Activation Patching Reveals Language-Agnostic Concept Representations in Transformers

Toddlers' Active Gaze Behavior Supports Self-Supervised Object Learning

ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model

Fuzz-Testing Meets LLM-Based Agents: An Automated and Efficient Framework for Jailbreaking Text-To-Image Generation Models

Evaluating Long Range Dependency Handling in Code Generation LLMs

Physics-informed Imitative Reinforcement Learning for Real-world Driving

COBRA-PPM: A Causal Bayesian Reasoning Architecture Using Probabilistic Programming for Robot Manipulation Under Uncertainty

FluoroSAM: A Language-promptable Foundation Model for Flexible X-ray Image Segmentation

Do Concept Bottleneck Models Respect Localities?

When Large Language Models contradict humans? Large Language Models' Sycophantic Behavior

Low-light Pedestrian Detection in Visible and Infrared Image Feeds: Issues and Challenges

A Survey on Explainable Reinforcement Learning: Concepts, Algorithms, Challenges

PhysUniBench: An Undergraduate-Level Physics Reasoning Benchmark for Multimodal Models

Evaluating Generalization and Representation Stability in Small LMs via Prompting, Fine-Tuning and Out-of-Distribution Prompts

Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning

The Alignment Trap: Complexity Barriers

The State of Large Language Models for African Languages: Progress and Challenges

Hybrid AI for Responsive Multi-Turn Online Conversations with Novel Dynamic Routing and Feedback Adaptation

Turing Test 2.0: The General Intelligence Threshold

$C^3$-Bench: The Things Real Disturbing LLM based Agent in Multi-Tasking

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

Fine, I'll Merge It Myself: A Multi-Fidelity Framework for Automated Model Merging

Towards Better Benchmark Datasets for Inductive Knowledge Graph Completion

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

Disentangled representations of microscopy images

Define-ML: An Approach to Ideate Machine Learning-Enabled Systems

Weighted Mean Frequencies: a handcraft Fourier feature for 4D Flow MRI segmentation

Deciphering GunType Hierarchy through Acoustic Analysis of Gunshot Recordings

AI in the Writing Process: How Purposeful AI Supports Fosters Student Writing

Dense Video Captioning using Graph-based Sentence Summarization

Causal Representation Learning with Observation Grouping for CXR Classification

Vulnerability Disclosure through Adaptive Black-Box Adversarial Attacks on NIDS

Show, Tell and Summarize: Dense Video Captioning Using Visual Cue Aided Sentence Summarization

DeepQuark: deep-neural-network approach to multiquark bound states

Large Language Model-Driven Code Compliance Checking in Building Information Modeling

Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks

When Life Gives You Samples: The Benefits of Scaling up Inference Compute for Multilingual LLMs

WattsOnAI: Measuring, Analyzing, and Visualizing Energy and Carbon Footprint of AI Workloads

Industrial Energy Disaggregation with Digital Twin-generated Dataset and Efficient Data Augmentation

OctoThinker: Mid-training Incentivizes Reinforcement Learning Scaling

ReCode: Updating Code API Knowledge with Reinforcement Learning

Counterfactual Influence as a Distributional Quantity

Automatic Demonstration Selection for LLM-based Tabular Data Classification

An Agentic System for Rare Disease Diagnosis with Traceable Reasoning

Off-Policy Evaluation and Learning for the Future under Non-Stationarity

SV-LLM: An Agentic Approach for SoC Security Verification using Large Language Models

Client Clustering Meets Knowledge Sharing: Enhancing Privacy and Robustness in Personalized Peer-to-Peer Learning

CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition

Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups

Created by

Haebom

Author

Weiqiu You, Helen Qu, Marco Gatti, Bhuvnesh Jain, Eric Wong

Outline

This paper addresses the issue that self-attributing neural networks (SANNs), which are presented as a potential path for interpretable models for high-dimensional problems, often face significant trade-offs in terms of poor performance. This paper formally proves a lower bound on the error of feature-wise SANNs, while group-based SANNs can achieve zero error and thus high performance. Based on this insight, in this paper, we propose a Sum-of-Parts (SOP) framework that transforms any differentiable model into a group-based SANN that learns feature groups end-to-end without group supervision. SOP achieves state-of-the-art performance for SANNs on vision and language tasks, and verifies that the groups are interpretable for a variety of quantitative and semantic metrics. We also demonstrate the utility of SOP explanations in model debugging and cosmological scientific discovery. The code is available at https://github.com/BrachioLab/sop .

GitHub - BrachioLab/sop: Sum-of-Parts Models: Faithful Attributions for Groups of Features

Sum-of-Parts Models: Faithful Attributions for Groups of Features - BrachioLab/sop

github.com

Takeaways, Limitations

•

Takeaways:

◦

We mathematically prove that group-based SANN can achieve higher performance than feature-wise SANN.

◦

We present a SOP framework that can transform any differentiable model into a group-based SANN.

◦

SANN achieves state-of-the-art performance on vision and language tasks.

◦

Validation of the interpretability of the learned group using quantitative and semantic indicators.

◦

Validation of the usefulness of SOP descriptions for model debugging and scientific discovery (cosmology).

•

Limitations:

◦

Further research is needed on the general applicability of the SOP framework.

◦

Further research is needed to determine optimal group size or group composition for specific problems.

◦

Consideration should be given to the limitations of quantitative and semantic indicators used to assess interpretability.

View PDF

Made with Slashpage