haebom
Daily Arxiv
전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.
Enhancing Password Security Through a High-Accuracy Scoring Framework Using Random Forests
Potent but Stealthy: Rethink Profile Pollution against Sequential Recommendation via Bi-level Constrained Reinforcement Paradigm
Leveraging Large Language Models for Use Case Model Generation from Software Requirements
Enhancing PIBT via Multi-Action Operations
PressTrack-HMR: Pressure-Based Top-Down Multi-Person Global Human Mesh Recovery
Thinking Forward and Backward: Multi-Objective Reinforcement Learning for Retrieval-Augmented Reasoning
PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Learning the Basis: A Kolmogorov-Arnold Network Approach Embedding Green's Function Priors
Automatic Grid Updates for Kolmogorov-Arnold Networks using Layer Histograms
Remodeling Semantic Relationships in Vision-Language Fine-Tuning
Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models
MVU-Eval: Towards Multi-Video Understanding Evaluation for Multimodal LLMs
Explainable Cross-Disease Reasoning for Cardiovascular Risk Assessment from LDCT
Retrieval-Augmented Generation in Medicine: A Scoping Review of Technical Implementations, Clinical Applications, and Ethical Considerations
Personalized Chain-of-Thought Summarization of Financial News for Investor Decision Support
Search Is Not Retrieval: Decoupling Semantic Matching from Contextual Assembly in RAG
Does AI-Assisted Coding Deliver? A Difference-in-Differences Study of Cursor's Impact on Software Projects
CORE - A Cell-Level Coarse-to-Fine Image Registration Engine for Multi-stain Image Alignment
Text to Robotic Assembly of Multi Component Objects using 3D Generative AI and Vision Language Models
WOD-E2E: Waymo Open Dataset for End-to-End Driving in Challenging Long-tail Scenarios
Xiaoice: Training-Free Video Understanding via Self-Supervised Spatio-Temporal Clustering of Semantic Features
A Critical Review of the Need for Knowledge-Centric Evaluation of Quranic Recitation
The Markovian Thinker: Architecture-Agnostic Linear Scaling of Reasoning
Artificial-Intelligence Grading Assistance for Handwritten Components of a Calculus Exam
Computing Wasserstein Barycenters through Gradient Flows
Enhancing the development of Cherenkov Telescope Array control software with Large Language Models
GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models
One-Shot Multi-Label Causal Discovery in High-Dimensional Event Sequences
FHIR-AgentBench: Benchmarking LLM Agents for Realistic Interoperable EHR Question Answering
Towards Practical Multi-label Causal Discovery in High-Dimensional Event Sequences via One-Shot Graph Aggregation
Inference Offloading for Cost-Sensitive Binary Classification at the Edge
Collapse of Irrelevant Representations (CIR) Ensures Robust and Non-Disruptive LLM Unlearning
Zero-Shot Referring Expression Comprehension via Vison-Language True/False Verification
Retrieval-Augmented Generation for Reliable Interpretation of Radio Regulations
Dual-Mode Deep Anomaly Detection for Medical Manufacturing: Structural Similarity and Feature Distance
ManipDreamer3D : Synthesizing Plausible Robotic Manipulation Video with Occupancy-aware 3D Trajectory
ChronoGraph: A Real-World Graph-Based Multivariate Time Series Dataset
Bine Trees: Enhancing Collective Operations by Optimizing Communication Locality
EcomMMMU: Strategic Utilization of Visuals for Robust Multimodal E-commerce Models
Improving Pre-Trained Vision-Language-Action Policies with Model-Based Search
Beyond Frequency: Seeing Subtle Cues Through the Lens of Spatial Decomposition for Fine-Grained Visual Classification
Test-Time Reinforcement Learning for GUI Grounding via Region Consistency
Towards Embodied Agentic AI: Review and Classification of LLM- and VLM-Driven Robot Autonomy and Interaction
Application-Specific Component-Aware Structured Pruning of Deep Neural Networks in Control via Soft Coefficient Optimization
Cameras as Relative Positional Encoding
The Prompt War: How AI Decides on a Military Intervention
xLSTMAD: A Powerful xLSTM-based Method for Anomaly Detection
Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study
Interpretable and Granular Video-Based Quantification of Motor Characteristics from the Finger Tapping Test in Parkinson Disease
Feedback-MPPI: Fast Sampling-Based MPC via Rollout Differentiation -- Adios low-level controllers
VFEFL: Privacy-Preserving Federated Learning against Malicious Clients via Verifiable Functional Encryption
Understanding Human-AI Trust in Education
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning
Scaling Textual Gradients via Sampling-Based Momentum
Caption This, Reason That: VLMs Caught in the Middle
BroadGen: A Framework for Generating Effective and Efficient Advertiser Broad Match Keyphrase Recommendations
Revisiting Residual Connections: Orthogonal Updates for Stable and Efficient Deep Networks
Scalable Quantum State Preparation via Large-Language-Model-Driven Discovery
OODTE: A Differential Testing Engine for the ONNX Optimizer
Constructing an Optimal Behavior Basis for the Option Keyboard
Bridging LMS and generative AI: dynamic course content integration (DCCI) for enhancing student satisfaction and engagement via the ask ME assistant
ForAug: Recombining Foregrounds and Backgrounds to Improve Vision Transformer Training with Bias Mitigation
Feature-EndoGaussian: Feature Distilled Gaussian Splatting in Surgical Deformable Scene Reconstruction
Abn-BLIP: Abnormality-aligned Bootstrapping Language-Image Pre-training for Pulmonary Embolism Diagnosis and Report Generation from CTPA
FactReasoner: A Probabilistic Approach to Long-Form Factuality Assessment for Large Language Models
MMTEB: Massive Multilingual Text Embedding Benchmark
Enhanced Structured Lasso Pruning with Class-wise Information
CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning
Graph of Attacks with Pruning: Optimizing Stealthy Jailbreak Prompt Generation for Enhanced LLM Content Moderation
Siren: A Learning-Based Multi-Turn Attack Framework for Simulating Real-World Human Jailbreak Behaviors
Enhanced Suicidal Ideation Detection from Social Media Using a CNN-BiLSTM Hybrid Model
Interpretable Neural ODEs for Gene Regulatory Network Discovery under Perturbations
Reducing the Scope of Language Models
Matryoshka Pilot: Learning to Drive Black-Box LLMs with LLMs
Captions Speak Louder than Images: Generalizing Foundation Models for E-commerce from High-quality Multimodal Instruction Data
Multi-Turn Interactions for Text-to-SQL with Large Language Models
Differentiating between human-written and AI-generated texts using linguistic features automatically extracted from an online computational tool
Spikingformer: A Key Foundation Model for Spiking Neural Networks
SOM Directions are Better than One: Multi-Directional Refusal Suppression in Language Models
An Efficient Training Pipeline for Reasoning Graphical User Interface Agents
National Institute on Aging PREPARE Challenge: Early Detection of Cognitive Impairment Using Speech -- The SpeechCARE Solution
Information Capacity: Evaluating the Efficiency of Large Language Models via Text Compression
Enhancing Logical Expressiveness in Graph Neural Networks via Path-Neighbor Aggregation
WaterMod: Modular Token-Rank Partitioning for Probability-Balanced LLM Watermarking
Green AI: A systematic review and meta-analysis of its definitions, lifecycle models, hardware and measurement attempts
GHOST: Solving the Traveling Salesman Problem on Graphs of Convex Sets
A Brain Cell Type Resource Created by Large Language Models and a Multi-Agent AI System for Collaborative Community Annotation
LiveResearchBench: A Live Benchmark for User-Centric Deep Research in the Wild
Ax-Prover: A Deep Reasoning Agentic Framework for Theorem Proving in Mathematics and Quantum Physics
PsychCounsel-Bench: Evaluating the Psychology Intelligence of Large Language Models
DOoM: Difficult Olympiads of Math
From Capabilities to Performance: Evaluating Key Functional Properties of LLM Architectures in Penetration Testing
Planning Agents on an Ego-Trip: Leveraging Hybrid Ego-Graph Ensembles for Improved Tool Retrieval in Enterprise Task Planning
PITA: Preference-Guided Inference-Time Alignment for LLM Post-Training
Small Models Struggle to Learn from Strong Reasoners
Unlocking Efficient Vehicle Dynamics Modeling via Analytic World Models
Enhancing Conflict Resolution in Language Models via Abstract Argumentation
Discussion Graph Semantics of First-Order Logic with Equality for Reasoning about Discussion and Argumentation
A Comprehensive Survey on Multi-modal Conversational Emotion Recognition with Deep Learning
Black-Box On-Policy Distillation of Large Language Models
Load more
InfoMosaic-Bench: Evaluating Multi-Source Information Seeking in Tool-Augmented Agents
Created by
Haebom
저자
Yaxin Du, Yuanshuo Zhang, Xiyuan Yang, Yifan Zhou, Cheng Wang, Gongyi Zou, Xianghe Pang, Wenhao Wang, Menglan Chen, Shuo Tang, Zhiyu Li, Feiyu Xiong, Siheng Chen
개요
본 논문은 정보 탐색에 대한 LLM 에이전트의 한계를 지적하고, 전문 도구와 일반 검색을 통합하는 능력을 평가하기 위한 새로운 벤치마크인 InfoMosaic-Bench를 소개합니다. 이 벤치마크는 다양한 도메인에서 일반 검색과 도메인별 도구를 결합해야 하는 과제를 포함하며, 실험을 통해 LLM 에이전트가 이러한 통합에 어려움을 겪는다는 것을 밝힙니다.
시사점, 한계점
•
시사점:
◦
웹 정보만으로는 충분하지 않으며, 도메인별 도구의 활용이 필수적입니다.
◦
도메인 도구는 선택적인 이점을 제공하지만, 일관성이 부족합니다.
◦
LLM 에이전트는 도구 사용 및 선택에 어려움을 겪습니다.
•
한계점:
◦
현재 LLM 에이전트의 도구 활용 능력 부족.
◦
도구 통합 및 복잡한 정보 탐색 작업의 어려움.
PDF 보기
Made with Slashpage