Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

DNN-Based Precoding in RIS-Aided mmWave MIMO Systems With Practical Phase Shift

Holistic Tokenizer for Autoregressive Image Generation

MAGIC: Mask-Guided Diffusion Inpainting with Multi-Level Perturbations and Context-Aware Alignment for Few-Shot Anomaly Generation

Order Acquisition Under Competitive Pressure: A Rapidly Adaptive Reinforcement Learning Approach for Ride-Hailing Subsidy Strategies

Towards a Playground to Democratize Experimentation and Benchmarking of AI Agents for Network Troubleshooting

AC-DiT: Adaptive Coordination Diffusion Transformer for Mobile Manipulation

GPU-based complete search for nonlinear minimization subject to bounds

Text Detoxification: Data Efficiency, Semantic Preservation and Model Generalization

On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization

Robotic Manipulation by Imitating Generated Videos Without Physical Demonstrations

LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling

Horus: A Protocol for Trustless Delegation Under Uncertainty

Visual Anagrams Reveal Hidden Differences in Holistic Shape Processing Across Vision Models

Diversity Conscious Refined Random Forest

GLU Attention Improve Transformer

Implicit Reward as the Bridge: A Unified View of SFT and DPO Connections

CRISP-SAM2: SAM2 with Cross-Modal Interaction and Semantic Prompting for Multi-Organ Segmentation

Treatment, evidence, imitation, and chat

EFRame: Deeper Reasoning via Exploration-Filter-Replay Reinforcement Learning Framework

Towards Understanding the Cognitive Habits of Large Reasoning Models

SEAL: Vision-Language Model-Based Safe End-to-End Cooperative Autonomous Driving with Adaptive Long-Tail Modeling

Inside you are many wolves: Using cognitive models to interpret value trade-offs in LLMs

AI and Agile Software Development: From Frustration to Success -- XP2025 Workshop Summary

Position: Machine Learning Conferences Should Establish a "Refutations and Critiques" Track

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

NOVA: Navigation via Object-Centric Visual Autonomy for High-Speed Target Tracking in Unstructured GPS-Denied Environments

Comparative Evaluation of ChatGPT and DeepSeek Across Key NLP Tasks: Strengths, Weaknesses, and Domain-Specific Performance

RoboMonkey: Scaling Test-Time Sampling and Verification for Vision-Language-Action Models

Human2LocoMan: Learning Versatile Quadrupedal Manipulation with Human Pretraining

Casper: Inferring Diverse Intents for Assistive Teleoperation with Vision Language Models

Discrete Diffusion in Large Language and Multimodal Models: A Survey

Robust Molecular Property Prediction via Densifying Scarce Labeled Data

LearnAlign: Reasoning Data Selection for Reinforcement Learning in Large Language Models Based on Improved Gradient Alignment

AutoMind: Adaptive Knowledgeable Agent for Automated Data Science

HPCTransCompile: An AI Compiler Generated Dataset for High-Performance CUDA Transpilation and LLM Preliminary Exploration

Towards Practical Alzheimer's Disease Diagnosis: A Lightweight and Interpretable Spiking Neural Model

The Geometries of Truth Are Orthogonal Across Tasks

TrajFlow: Multi-modal Motion Prediction via Flow Matching

Graph-Assisted Stitching for Offline Hierarchical Reinforcement Learning

Making a Pipeline Production-Ready: Challenges and Lessons Learned in the Healthcare Domain

FinBERT2: A Specialized Bidirectional Encoder for Bridging the Gap in Finance-Specific Deployment of Large Language Models

Recommender systems, stigmergy, and the tyranny of popularity

HAVIR: HierArchical Vision to Image Reconstruction using CLIP-Guided Versatile Diffusion

SLAC: Simulation-Pretrained Latent Action Space for Whole-Body Real-World RL

RewardAnything: Generalizable Principle-Following Reward Models

BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance

A Quantum Information Theoretic Approach to Tractable Probabilistic Models

Playing with Transformer at 30+ FPS via Next-Frame Diffusion

SC-LoRA: Balancing Efficient Fine-tuning and Knowledge Preservation via Subspace-Constrained LoRA

Deep Retrieval at CheckThat! 2025: Identifying Scientific Papers from Implicit Social Media Mentions via Hybrid Retrieval and Re-Ranking

Extended Inductive Reasoning for Personalized Preference Inference from Behavioral Signals

Better Safe Than Sorry? Overreaction Problem of Vision Language Models in Visual Emergency Recognition

Adaptive Inference-Time Scaling via Cyclic Diffusion Search

Domain Adaptation of VLM for Soccer Video Understanding

Towards Universal Semantics With Large Language Models

ReviewInstruct: A Review-Driven Multi-Turn Conversations Generation Method for Large Language Models

An Exploration of Default Images in Text-to-Image Generation

Aggregating Concepts of Fairness and Accuracy in Prediction Algorithms

PRUNE: A Patching Based Repair Framework for Certifiable Unlearning of Neural Networks

Enhancing Satellite Object Localization with Dilated Convolutions and Attention-aided Spatial Pooling

Explainable Coarse-to-Fine Ancient Manuscript Duplicates Discovery

Rewriting Pre-Training Data Boosts LLM Performance in Math and Code

The Role of Open-Source LLMs in Shaping the Future of GeoAI

EasyEdit2: An Easy-to-use Steering Framework for Editing Large Language Models

AI for the Open-World: the Learning Principles

Towards Explainable Fusion and Balanced Learning in Multimodal Sentiment Analysis

TerraMind: Large-Scale Generative Multimodality for Earth Observation

NativQA Framework: Enabling LLMs with Native, Local, and Everyday Knowledge

Breach in the Shield: Unveiling the Vulnerabilities of Large Language Models

Quantifying Robustness: A Benchmarking Framework for Deep Learning Forecasting in Cyber-Physical Systems

Enhancing Personalized Multi-Turn Dialogue with Curiosity Reward

EAP4EMSIG -- Enhancing Event-Driven Microscopy for Microfluidic Single-Cell Analysis

UNITYAI-GUARD: Pioneering Toxicity Detection Across Low-Resource Indian Languages

CMD-HAR: Cross-Modal Disentanglement for Wearable Human Activity Recognition

Construction Identification and Disambiguation Using BERT: A Case Study of NPN

PVChat: Personalized Video Chat with One-Shot Learning

Neural Discrete Token Representation Learning for Extreme Token Reduction in Video Large Language Models

MASS: Mathematical Data Selection via Skill Graphs for Pretraining Large Language Models

Long Context Modeling with Ranked Memory-Augmented Retrieval

Probing Latent Subspaces in LLM for AI Security: Identifying and Manipulating Adversarial States

Language Models can Self-Improve at State-Value Estimation for Better Search

Offline RLAIF: Piloting VLM Feedback for RL via SFO

Integrating Biological and Machine Intelligence: Attention Mechanisms in Brain-Computer Interfaces

Graded Neural Networks

Do LLMs Understand the Safety of Their Inputs? Training-Free Moderation via Latent Prototypes

FairFare: A Tool for Crowdsourcing Rideshare Data to Empower Labor Organizers

Federated Continual Learning: Concepts, Challenges, and Solutions

UniForm: A Unified Multi-Task Diffusion Transformer for Audio-Video Generation

Code Simulation as a Proxy for High-order Tasks in Large Language Models

An Efficient Local Search Approach for Polarized Community Discovery in Signed Networks

Learning Traffic Anomalies from Generative Models on Real-Time Observations

Explainable AI for Mental Health Emergency Returns: Integrating LLMs with Predictive Modeling

In-Context Meta LoRA Generation

Rethinking Table Instruction Tuning

Autonomous Microscopy Experiments through Large Language Model Agents

Static Segmentation by Tracking: A Label-Efficient Approach for Fine-Grained Specimen Image Segmentation

Random weights of DNNs and emergence of fixed points

On the Expressiveness and Length Generalization of Selective State-Space Models on Regular Languages

Mask Approximation Net: A Novel Diffusion Model Approach for Remote Sensing Change Captioning

Enhancing Long Video Generation Consistency without Tuning

Effects of structure on reasoning in instance-level Self-Discover

Created by

Haebom

저자

Sachith Gunasekara, Yasiru Ratnayake

개요

본 논문은 복합 시스템과의 통합에서 예측 가능한 LLM 추론에 대한 요구가 구조화된 출력을 대중화했지만, 비구조화된 자연어와 비교한 성능 저하에 대한 우려가 여전히 남아있다는 점을 지적합니다. 비구조화된 Chain of Thought (CoT) 추적 데이터로의 학습은 새로운 강력한 추론 모델을 만들어냈지만, 계산 비용과 신뢰성 문제를 야기합니다. 본 논문에서는 Self-Discover 프레임워크의 인스턴스 수준 적응인 iSelf-Discover를 제시하고, 동적으로 생성된 구조화된 JSON 추론과 비구조화된 추론을 비교합니다. 다양한 벤치마크에 대한 실험 결과, 비구조화된 추론이 일관되게 우수한 성능을 보임을 보여줍니다. 특히, 복잡한 MATH 벤치마크에서 비구조화된 계획은 구조화된 접근 방식보다 최대 18.90%의 상대적 성능 향상을 달성했습니다. 제로샷 비구조화 iSelf-Discover 변형은 파이브샷 구조화된 변형보다 성능이 우수하여, 추론이 최종 답변에 앞서 동적으로 생성되더라도 이러한 차이가 중요함을 강조합니다. 또한, 최적의 계획 생성 세분성(인스턴스 수준 대 작업 수준)은 맥락에 따라 다름을 보여줍니다. 이러한 결과는 복잡한 문제 해결을 위한 구조화된 형식에 대한 의존성과 복합 시스템의 구성 방식을 재평가해야 함을 시사합니다.

시사점, 한계점

•

시사점:

◦

비구조화된 추론이 구조화된 추론보다 복잡한 문제 해결에서 더 나은 성능을 보일 수 있음을 실험적으로 증명.

◦

MATH 벤치마크에서 비구조화된 계획이 구조화된 계획보다 최대 18.90% 향상된 성능을 보임.

◦

제로샷 비구조화 모델이 파이브샷 구조화 모델보다 우수한 성능을 나타냄.

◦

계획 생성의 최적 세분성은 작업의 특성에 따라 달라짐을 제시.

◦

복합 시스템 설계에서 구조화된 형식에 대한 의존성 재고 필요성 제시.

•

한계점:

◦

특정 벤치마크와 모델에 국한된 결과일 수 있음.

◦

다양한 유형의 문제와 모델에 대한 추가 연구가 필요함.

◦

계산 비용 및 신뢰성 문제에 대한 추가적인 분석 필요.

◦

최적의 계획 생성 세분성 결정에 대한 명확한 지침 부족.

Made with Slashpage