Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Towards Autonomous UAV Visual Object Search in City Space: Benchmark and Agentic Methodology

Hakim: Farsi Text Embedding Model

Fast Text-to-Audio Generation with Adversarial Post-Training

Fusing Bidirectional Chains of Thought and Reward Mechanisms A Method for Enhancing Question-Answering Capabilities of Large Language Models for Chinese Intangible Cultural Heritage

Intelligent Product 3.0: Decentralised AI Agents and Web3 Intelligence Standards

Neural Brain: A Neuroscience-inspired Framework for Embodied Agents

Decoding Futures Price Dynamics: A Regularized Sparse Autoencoder for Interpretable Multi-Horizon Forecasting and Factor Discovery

Steerable Chatbots: Personalizing LLMs with Preference-Based Activation Steering

SafeNav: Safe Path Navigation using Landmark Based Localization in a GPS-denied Environment

Don't be lazy: CompleteP enables compute-efficient deep transformers

Llama-Nemotron: Efficient Reasoning Models

Rethinking Time Encoding via Learnable Transformation Functions

CrashFixer: A crash resolution agent for the Linux kernel

Harden and Catch for Just-in-Time Assured LLM-Based Software Testing: Open Research Challenges

Multi-Agent Reinforcement Learning Simulation for Environmental Policy Synthesis

Energy Matching: Unifying Flow Matching and Energy-Based Models for Generative Modeling

Beyond Single-Turn: A Survey on Multi-Turn Interactions with Large Language Models

Evaluating Clinical Competencies of Large Language Models with a General Practice Benchmark

Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering

The Problem of the Priors, or Posteriors?

Simulating and Analysing Human Survey Responses with Large Language Models: A Case Study in Energy Stated Preference

Learning Autonomy: Off-Road Navigation Enhanced by Human Input

InductionBench: LLMs Fail in the Simplest Complexity Class

Activation Steering in Neural Theorem Provers

PropNet: a White-Box and Human-Like Network for Sentence Representation

Principled Data Selection for Alignment: The Hidden Risks of Difficult Examples

FAS: Fast ANN-SNN Conversion for Spiking Large Language Models

Learning Traffic Anomalies from Generative Models on Real-Time Observations

High-temperature superconductivity in Li$_2$AuH$_6$ mediated by strong electron-phonon coupling under ambient pressure

A Bio-Inspired Research Paradigm of Collision Perception Neurons Enabling Neuro-Robotic Integration: The LGMD Case

ThreatModeling-LLM: Automating Threat Modeling using Large Language Models for Banking System

Is Linear Feedback on Smoothed Dynamics Sufficient for Stabilizing Contact-Rich Plans?

What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks

FAMMA: A Benchmark for Financial Domain Multilingual Multimodal Question Answering

Deep Signature: Characterization of Large-Scale Molecular Dynamics

State-of-the-Art Periorbital Distance Prediction and Disease Classification Using Periorbital Features

A Deep Learning Approach for Pixel-level Material Classification via Hyperspectral Imaging

Fragment-Masked Diffusion for Molecular Optimization

Public Constitutional AI

Cognitive Insights and Stable Coalition Matching for Fostering Multi-Agent Cooperation

DACAD: Domain Adaptation Contrastive Learning for Anomaly Detection in Multivariate Time Series

Construction and Application of Materials Knowledge Graph in Multidisciplinary Materials Science via Large Language Model

Detecting Multimedia Generated by Large AI Models: A Survey

Efficient approximation of Earth Mover's Distance Based on Nearest Neighbor Search

EiHi Net: Out-of-Distribution Generalization Paradigm

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

RefPentester: A Knowledge-Informed Self-Reflective Penetration Testing Framework Based on Large Language Models

CAT Merging: A Training-Free Approach for Resolving Conflicts in Model Merging

SafeMate: A Modular RAG-Based Agent for Context-Aware Emergency Guidance

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning

AdaWorld: Learning Adaptable World Models with Latent Actions

Reinforcement Learning-based Heuristics to Guide Domain-Independent Dynamic Programming

An Analytical Emotion Framework of Rumour Threads on Social Media

Deontic Temporal Logic for Formal Verification of AI Ethics

PSPO*: An Effective Process-supervised Policy Optimization for Reasoning Alignment

Think Smart, Act SMARL! Analyzing Probabilistic Logic Shields for Multi-Agent Reinforcement Learning

Exploiting Uncertainty for Querying Inconsistent Description Logics Knowledge Bases

Learning to Be Cautious

Customizing a Large Language Model for VHDL Design of High-Performance Microprocessors

How Hungry is AI? Benchmarking Energy, Water, and Carbon Footprint of LLM Inference

WorldView-Bench: A Benchmark for Evaluating Global Cultural Perspectives in Large Language Models

Variational Visual Question Answering

Ethics and Persuasion in Reinforcement Learning from Human Feedback: A Procedural Rhetorical Approach

BLIP3-o: A Family of Fully Open Unified Multimodal Models-Architecture, Training and Dataset

Meta-learning Slice-to-Volume Reconstruction in Fetal Brain MRI using Implicit Neural Representations

Learning Long-Context Diffusion Policies via Past-Token Prediction

WavReward: Spoken Dialogue Models With Generalist Reward Evaluators

Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput

Preserving Plasticity in Continual Learning with Adaptive Linearity Injection

Deploying Foundation Model-Enabled Air and Ground Robots in the Field: Challenges and Opportunities

A 2D Semantic-Aware Position Encoding for Vision Transformers

Quantum state-agnostic work extraction (almost) without dissipation

Evaluating GPT- and Reasoning-based Large Language Models on Physics Olympiad Problems: Surpassing Human Performance and Implications for Educational Assessment

CXMArena: Unified Dataset to benchmark performance in realistic CXM Scenarios

Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records

Multilingual Machine Translation with Quantum Encoder Decoder Attention-based Convolutional Variational Circuits

Quantum-Enhanced Parameter-Efficient Learning for Typhoon Trajectory Forecasting

UMotion: Uncertainty-driven Human Motion Estimation from Inertial and Ultra-wideband Units

FedSaaS: Class-Consistency Federated Semantic Segmentation via Global Prototype Supervision and Local Adversarial Harmonization

The Voice Timbre Attribute Detection 2025 Challenge Evaluation Plan

Examining Deployment and Refinement of the VIOLA-AI Intracranial Hemorrhage Model Using an Interactive NeoMedSys Platform

TensorRL-QAS: Reinforcement learning with tensor networks for scalable quantum architecture search

GreenFactory: Ensembling Zero-Cost Proxies to Estimate Performance of Neural Networks

Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures

Evaluating the Robustness of Adversarial Defenses in Malware Detection Systems

BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis

Neural Video Compression using 2D Gaussian Splatting

Toward Fair Federated Learning under Demographic Disparities and Data Imbalance

MetaUAS: Universal Anomaly Segmentation with One-Prompt Meta-Learning

Learning to Detect Multi-class Anomalies with Just One Normal Image Prompt

Few-Shot Anomaly-Driven Generation for Anomaly Classification and Segmentation

EDBench: Large-Scale Electron Density Data for Molecular Modeling

Focus, Merge, Rank: Improved Question Answering Based on Semi-structured Knowledge Bases

Educational impacts of generative artificial intelligence on learning and performance of engineering students in China

InvDesFlow-AL: Active Learning-based Workflow for Inverse Design of Functional Materials

DRRNet: Macro-Micro Feature Fusion and Dual Reverse Refinement for Camouflaged Object Detection

An Initial Exploration of Default Images in Text-to-Image Generation

A Multi-Task Foundation Model for Wireless Channel Representation Using Contrastive and Masked Autoencoder Learning

ELIS: Efficient LLM Iterative Scheduling System with Response Length Predictor

Fair Clustering via Alignment

InductionBench: LLMs Fail in the Simplest Complexity Class

Created by

Haebom

저자

Wenyue Hua, Tyler Wong, Sun Fei, Liangming Pan, Adam Jardine, William Yang Wang

개요

본 논문은 대규모 언어 모델(LLMs)의 연역적 추론 능력은 발전했지만, 귀납적 추론 능력은 상대적으로 덜 연구되었다는 점을 지적합니다. 연구진은 LLMs의 귀납적 추론 능력을 평가하기 위해 새로운 벤치마크인 InductionBench를 제시합니다. 실험 결과, 최첨단 모델들조차도 InductionBench의 가장 단순한 복잡도 클래스에서 어려움을 겪는다는 것을 보여주며, 현재 LLMs의 귀납적 추론 능력의 부족을 강조합니다. GitHub에 코드와 데이터를 공개했습니다.

시사점, 한계점

•

시사점: LLMs의 귀납적 추론 능력에 대한 체계적인 평가 및 분석을 위한 새로운 벤치마크(InductionBench) 제시. 현재 LLMs의 귀납적 추론 능력의 한계를 명확히 제시하여 향후 연구 방향을 제시.

•

한계점: InductionBench가 제시하는 과제의 범위가 제한적일 수 있음. 다양한 유형의 귀납적 추론 과제를 포함하여 벤치마크의 포괄성을 높일 필요가 있음. 현재 벤치마크의 복잡도 클래스가 LLMs의 능력을 완전히 포괄하지 못할 가능성 존재.

Made with Slashpage