Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

End-to-End Text-to-SQL with Dataset Selection: Leveraging LLMs for Adaptive Query Generation

Fourier-VLM: Compressing Vision Tokens in the Frequency Domain for Large Vision-Language Models

LAG: Logic-Augmented Generation from a Cartesian Perspective

Echo: Decoupling Inference and Training for Large-Scale RL Alignment on Heterogeneous Swarms

FDC-Net: Rethinking the association between EEG artifact removal and multi-dimensional affective computing

Fairness in Dysarthric Speech Synthesis: Understanding Intrinsic Bias in Dysarthric Speech Cloning using F5-TTS

RCR-Router: Efficient Role-Aware Context Routing for Multi-Agent LLM Systems with Structured Memory

DS$^2$Net: Detail-Semantic Deep Supervision Network for Medical Image Segmentation

LLMDistill4Ads: Using Cross-Encoders to Distill from LLM Signals for Advertiser Keyphrase Recommendations at eBay

When Cars Have Stereotypes: Auditing Demographic Bias in Objects from Text-to-Image Models

HiTeC: Hierarchical Contrastive Learning on Text-Attributed Hypergraph with Semantic-Aware Augmentation

SpectrumFM: Redefining Spectrum Cognition via Foundation Modeling

Dynamic Robot-Assisted Surgery with Hierarchical Class-Incremental Semantic Segmentation

A novel language model for predicting serious adverse event outcomes in clinical trials from their prospective registrations

A Bit of Freedom Goes a Long Way: Classical and Quantum Algorithms for Reinforcement Learning under a Generative Model

ALLoyM: A large language model for alloy phase diagram prediction

Learning Phonetic Context-Dependent Viseme for Enhancing Speech-Driven 3D Facial Animation

Are Vision Foundation Models Ready for Out-of-the-Box Medical Image Registration?

SystolicAttention: Fusing FlashAttention within a Single Systolic Array

RAPNet: A Receptive-Field Adaptive Convolutional Neural Network for Pansharpening

AMix-1: A Pathway to Test-Time Scalable Protein Foundation Model

Bridging the Last Mile of Prediction: Enhancing Time Series Forecasting with Conditional Guided Flow Matching

Speckle2Self: Self-Supervised Ultrasound Speckle Reduction Without Clean Data

LIRA: Inferring Segmentation in Large Multi-modal Models with Local Interleaved Region Assistance

Addressing The Devastating Effects Of Single-Task Data Poisoning In Exemplar-Free Continual Learning

Foundation versus Domain-specific Models: Performance Comparison, Fusion, and Explainability in Face Recognition

Probabilistic Optimality for Inference-time Scaling

ARAG: Agentic Retrieval Augmented Generation for Personalized Recommendation

Exploring Adapter Design Tradeoffs for Low Resource Music Generation

CycleDistill: Bootstrapping Machine Translation using LLMs with Cyclical Distillation

Robust Anomaly Detection in Network Traffic: Evaluating Machine Learning Models on CICIDS2017

Robust Behavior Cloning Via Global Lipschitz Regularization

Granular-Ball-Induced Multiple Kernel K-Means

DRAMA-X: A Fine-grained Intent Prediction and Risk Reasoning Benchmark For Driving

MMET: A Multi-Input and Multi-Scale Transformer for Efficient PDEs Solving

A Two-stage Optimization Method for Wide-range Single-electron Quantum Magnetic Sensing

Physics-Informed Teleconnection-Aware Transformer for Global Subseasonal-to-Seasonal Forecasting

AI-Generated Compromises for Coalition Formation

MLOps with Microservices: A Case Study on the Maritime Domain

Winner-takes-all for Multivariate Probabilistic Time Series Forecasting

Leaps Beyond the Seen: Reinforced Reasoning Augmented Generation for Clinical Notes

Learning to Diagnose Privately: DP-Powered LLMs for Radiology Report Classification

HERGC: Heterogeneous Experts Representation and Generative Completion for Multimodal Knowledge Graphs

Verbal Werewolf: Engage Users with Verbalized Agentic Werewolf Game Framework

MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection

CADRE: Customizable Assurance of Data Readiness in Privacy-Preserving Federated Learning

FP4 All the Way: Fully Quantized Training of LLMs

Improving LLM Outputs Against Jailbreak Attacks with Expert Model Integration

Extracting Probabilistic Knowledge from Large Language Models for Bayesian Network Parameterization

RIDGECUT: Learning Graph Partitioning with Rings and Wedges

Uniform Loss vs. Specialized Optimization: A Comparative Analysis in Multi-Task Learning

Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?

A Multimodal Deep Learning Approach for White Matter Shape Prediction in Diffusion MRI Tractography

Sparsity Outperforms Low-Rank Projections in Few-Shot Adaptation

Bidirectional Hierarchical Protein Multi-Modal Representation Learning

How Post-Training Reshapes LLMs: A Mechanistic View on Knowledge, Truthfulness, Refusal, and Confidence

$\Mu$KE: Matryoshka Unstructured Knowledge Editing of Large Language Models

Learning 3D-Gaussian Simulators from RGB Videos

Learning Adaptive Dexterous Grasping from Single Demonstrations

A Theory of Learning with Autoregressive Chain of Thought

FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction

From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers

ElementaryNet: A Non-Strategic Neural Network for Predicting Human Behavior in Normal-Form Games

Collective Reasoning Among LLMs: A Framework for Answer Validation Without Ground Truth

Advancing AI-Powered Medical Image Synthesis: Insights from MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA

Predicting Depression in Screening Interviews from Interactive Multi-Theme Collaboration

Schema-Guided Scene-Graph Reasoning based on Multi-Agent Large Language Model System

MQuant: Unleashing the Inference Potential of Multimodal Large Language Models via Full Static Quantization

Mitigating Traffic Oscillations in Mixed Traffic Flow with Scalable Deep Koopman Predictive Control

Improving Your Model Ranking on Chatbot Arena by Vote Rigging

FIT-Print: Towards False-claim-resistant Model Ownership Verification via Targeted Fingerprint

Softplus Attention with Re-weighting Boosts Length Extrapolation in Large Language Models

Ehrenfeucht-Haussler Rank and Chain of Thought

WebWalker: Benchmarking LLMs in Web Traversal

Generative AI for Cel-Animation: A Survey

Toward Intelligent and Secure Cloud: Large Language Model Empowered Proactive Defense

MomentMix Augmentation with Length-Aware DETR for Temporally Robust Moment Retrieval

POEX: Towards Policy Executable Jailbreak Attacks Against the LLM-based Robots

B-VLLM: A Vision Large Language Model with Balanced Spatio-Temporal Tokens

LoRA.rar: Learning to Merge LoRAs via Hypernetworks for Subject-Style Conditioned Image Generation

Understanding and Mitigating Memorization in Generative Models via Sharpness of Probability Landscapes

Steering AI-Driven Personalization of Scientific Text for General Audiences

Zero-Shot Voice Conversion via Content-Aware Timbre Ensemble and Conditional Flow Matching

EfficientEQA: An Efficient Approach to Open-Vocabulary Embodied Question Answering

UoMo: A Universal Model of Mobile Traffic Forecasting for Wireless Network Optimization

MaCP: Minimal yet Mighty Adaptation via Hierarchical Cosine Projection

Exploring Spatial Representation to Enhance LLM Reasoning in Aerial Vision-Language Navigation

A Closer Look at Machine Unlearning for Large Language Models

In-Situ Fine-Tuning of Wildlife Models in IoT-Enabled Camera Traps for Efficient Adaptation

EEG-Language Pretraining for Highly Label-Efficient Clinical Phenotyping

A Practice of Post-Training on Llama-3 70B with Optimal Selection of Additional Language Mixture Ratio

Reward-Directed Score-Based Diffusion Models via q-Learning

Chain of Thought Still Thinks Fast: APriCoT Helps with Thinking Slow

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning

AI-AI Bias: large language models favor communications generated by large language models

LVBench: An Extreme Long Video Understanding Benchmark

From Spikes to Heavy Tails: Unveiling the Spectral Evolution of Neural Networks

Fractured Glass, Failing Cameras: Simulating Physics-Based Adversarial Samples for Autonomous Driving Systems

Runtime Monitoring and Enforcement of Conditional Fairness in Generative AIs

On the Sample Efficiency of Abstractions and Potential-Based Reward Shaping in Reinforcement Learning

Rethinking the Illusion of Thinking

Created by

Haebom

Author

I naki Dellibarda Varela, Pablo Romero-Sorozabal, Eduardo Rocon, Manuel Cebrian

Outline

This paper revisits the argument that the large-scale reasoning model (LRM) lacks reasoning ability, raised in Apple's paper "The Illusion of Thinking". Apple's paper argues that LRM is simply a probabilistic parrot, and presents Towers of Hanoi and River Crossing problems as examples. This paper reproduces and improves the experiments on these two problems, introducing step-by-step prompts and interactive dialogues to show that the conclusions of previous studies are exaggerated. We show that LRM's failure in Towers of Hanoi is due to cognitive limitations as well as output constraints, and its failure in River Crossing is due to an unsolvable problem setting. When limited to solvable problems, LRM easily solves large-scale problems with more than 100 agent pairs. Therefore, LRM is a probabilistic, reinforcement learning-tuned explorer in a discrete state space, and suggests that further detailed analysis is needed for the development of symbolic and long-term reasoning.

Takeaways, Limitations

•

Takeaways:

◦

Shows that the conclusions of Apple's "The Illusion of Thinking" paper were exaggerated.

◦

Refuting the conventional, simplistic interpretation of LRM's inferential ability.

◦

The causes of LRM failure are analyzed by dividing them into output constraints and cognitive limitations.

◦

We found that step-by-step prompts and interactive dialogues contributed to improving the performance of LRM.

◦

Emphasizes the importance of understanding the discrete state space of LRM.

◦

Provides detailed analysis methods for future research.

•

Limitations:

◦

Limited to analysis of specific problems (Towers of Hanoi, River Crossing).

◦

Limitations on generalizability to other types of inference problems.

◦

Lack of detailed mechanisms underlying LRM's cognitive limitations.

View PDF

Made with Slashpage