Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

Geo-Registration of Terrestrial LiDAR Point Clouds with Satellite Images without GNSS

Empowering Bridge Digital Twins by Bridging the Data Gap with a Unified Synthesis Framework

AI Agent Smart Contract Exploit Generation

ModelCitizens: Representing Community Voices in Online Safety

Sequential Attention-based Sampling for Histopathological Analysis

Classification of autoimmune diseases from Peripheral blood TCR repertoires by multimodal multi-instance learning

From Video to EEG: Adapting Joint Embedding Predictive Architecture to Uncover Visual Concepts in Brain Signal Analysis

Reinforcement Learning-based Feature Generation Algorithm for Scientific Data

RefineX: Learning to Refine Pre-training Data at Scale from Expert-Guided Programs

On Jailbreaking Quantized Language Models Through Fault Injection Attacks

PBa-LLM: Privacy- and Bias-aware NLP using Named-Entity Recognition (NER)

DriveMRP: Enhancing Vision-Language Models with Synthetic Motion Data for Motion Risk Prediction

Probing and Steering Evaluation Awareness of Language Models

Generating Heterogeneous Multi-dimensional Data : A Comparative Study

Autonomy by Design: Preserving Human Autonomy in AI Decision-Support

PBCAT: Patch-based composite adversarial training against physically realizable attacks on object detection

Federated Breast Cancer Detection Enhanced by Synthetic Ultrasound Image Augmentation

XY-Tokenizer: Mitigating the Semantic-Acoustic Conflict in Low-Bitrate Speech Codecs

DeepTalk: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE

SWE-SQL: Illuminating LLM Pathways to Solve User SQL Issues in Real-World Applications

LLM Agent for Hyper-Parameter Optimization

MedSyn: Enhancing Diagnostics with Human-AI Collaboration

Evaluating and Improving Robustness in Large Language Models: A Survey and Future Directions

ConTextTab: A Semantics-Aware Tabular In-Context Learner

QUITE: A Query Rewrite System Beyond Rules with LLM Agents

Saffron-1: Safety Inference Scaling

hdl2v: A Code Translation Dataset for Enhanced LLM Verilog Generation

Knockout LLM Assessment: Using Large Language Models for Evaluations through Iterative Pairwise Comparisons

Humanoid World Models: Open World Foundation Models for Humanoid Robotics

EMORL: Ensemble Multi-Objective Reinforcement Learning for Efficient and Flexible LLM Fine-Tuning

Adaptive Elicitation of Latent Information Using Natural Language

Substance over Style: Evaluating Proactive Conversational Coaching Agents

Real AI Agents with Fake Memories: Fatal Context Manipulation Attacks on Web3 Agents

Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models

Safer or Luckier? LLMs as Safety Evaluators Are Not Robust to Artifacts

UniF$^2$ace: Fine-grained Face Understanding and Generation with Unified Multimodal Models

DynamicID: Zero-Shot Multi-ID Image Personalization with Flexible Facial Editability

GMLM: Bridging Graph Neural Networks and Language Models for Heterophilic Node Classification

Towards Enterprise-Ready Computer Using Generalist Agent

Oscillation-Reduced MXFP4 Training for Vision Transformers

Understanding Fixed Predictions via Confined Regions

Multi-Attribute Steering of Language Models via Targeted Intervention

Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving

Theme-Explanation Structure for Table Summarization using Large Language Models: A Case Study on Korean Tabular Data

Scaling 4D Representations

AutoPrep: Natural Language Question-Aware Data Preparation with a Multi-Agent Framework

CHAI for LLMs: Improving Code-Mixed Translation in Large Language Models through Reinforcement Learning with AI Feedback

Vital Insight: Assisting Experts' Context-Driven Sensemaking of Multi-modal Personal Tracking Data Using Visualization and Human-In-The-Loop LLM

Hespi: A pipeline for automatically detecting information from hebarium specimen sheets

Diversifying Robot Locomotion Behaviors with Extrinsic Behavioral Curiosity

Pullback Flow Matching on Data Manifolds

The Impact of Generative AI on Collaborative Open-Source Software Development: Evidence from GitHub Copilot

Breaking PEFT Limitations: Leveraging Weak-to-Strong Knowledge Transfer for Backdoor Attacks in LLMs

Tokenization for Molecular Foundation Models

DilateQuant: Accurate and Efficient Diffusion Quantization via Weight Dilation

PersonaFlow: Designing LLM-Simulated Expert Perspectives for Enhanced Research Ideation

Disentangling Uncertainty for Safe Social Navigation using Deep Reinforcement Learning

CodeMirage: Hallucinations in Code Generated by Large Language Models

A Policy-Gradient Approach to Solving Imperfect-Information Games with Best-Iterate Convergence

Diffusion-Driven Semantic Communication for Generative Models with Bandwidth Constraints

Semantic Augmentation in Images using Language

Geometric Constraints in Deep Learning Frameworks: A Survey

Enhancing Plasticity for First Session Adaptation Continual Learning

Efficient Transfer Learning via Causal Bounds

FEVO: Financial Knowledge Expansion and Reasoning Evolution for Large Language Models

A Wireless Foundation Model for Multi-Task Prediction

GTA1: GUI Test-time Scaling Agent

Modeling (Deontic) Modal Operators With the s(CASP) Goal-directed Predicate Answer Set Programming System

Fuzzy Classification Aggregation for a Continuum of Agents

MedGellan: LLM-Generated Medical Guidance to Support Physicians

Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models

Establishing Best Practices for Building Rigorous Agentic Benchmarks

SHADE-Arena: Evaluating Sabotage and Monitoring in LLM Agents

TRiSM for Agentic AI: A Review of Trust, Risk, and Security Management in LLM-based Agentic Multi-Agent Systems

The end of radical concept nativism

AI-Driven Scholarly Peer Review via Persistent Workflow Prompting, Meta-Prompting, and Meta-Reasoning

Do Larger Language Models Imply Better Generalization? A Pretraining Scaling Law for Implicit Reasoning

SagaLLM: Context Management, Validation, and Transaction Guarantees for Multi-Agent LLM Planning

Towards Reasoning Era: A Survey of Long Chain-of-Thought for Reasoning Large Language Models

ROCKET-2: Steering Visuomotor Policy via Cross-View Goal Alignment

Hybrid Quantum-Classical Multi-Agent Pathfinding

FinSphere, a Real-Time Stock Analysis Agent Powered by Instruction-Tuned LLMs and Domain Tools

Multi-Agent Pathfinding Under Team-Connected Communication Constraint via Adaptive Path Expansion and Dynamic Leading

Can adversarial attacks by large language models be attributed?

Planning Anything with Rigor: General-Purpose Zero-Shot Planning with LLM-based Formalized Programming

OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework

Stepwise functional refoundation of relational concept analysis

A Survey on Event Prediction Methods from a Systems Perspective: Bringing Together Disparate Research Areas

An AI Approach for Learning the Spectrum of the Laplace-Beltrami Operator

Latent Acoustic Mapping for Direction of Arrival Estimation: A Self-Supervised Approach

DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning

Comparative Analysis of CNN and Transformer Architectures with Heart Cycle Normalization for Automated Phonocardiogram Classification

A Novel Hybrid Deep Learning Technique for Speech Emotion Detection using Feature Engineering

Advances in Intelligent Hearing Aids: Deep Learning Approaches to Selective Noise Cancellation

Modeling Heterogeneity across Varying Spatial Extents: Discovering Linkages between Sea Ice Retreat and Ice Shelve Melt in the Antarctic

Surrogate Model for Heat Transfer Prediction in Impinging Jet Arrays using Dynamic Inlet/Outlet and Flow Rate Control

PLAME: Leveraging Pretrained Language Models to Generate Enhanced Protein Multiple Sequence Alignments

Design and Implementation of an OCR-Powered Pipeline for Table Extraction from Invoices

FlexOlmo: Open Language Models for Flexible Data Use

Generating Multi-Table Time Series EHR from Latent Space with Minimal Preprocessing

SPARQ: Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms

Created by

Haebom

저자

Alex Havrilla, Edward Hughes, Mikayel Samvelyan, Jacob Abernethy

개요

본 논문은 대규모 언어 모델(LLM)을 이용한 합성 데이터 생성을 통해 모델 추론 능력을 향상시키는 새로운 방법인 SPARQ(Synthetic Problem Generation for Reasoning via Quality-Diversity Algorithms)를 제시합니다. 기존 방법들이 대규모 모델을 소규모 모델로 증류하거나, 또는 정답이 있는 문제 문항을 사용하는 것과 달리, SPARQ는 단일 모델만을 사용하여 문제의 해결률(난이도 척도)을 측정하여 고품질의 다양한 합성 수학 문제 및 솔루션 쌍을 생성합니다. 7.5K 개의 샘플 데이터셋으로부터 2천만 개 이상의 새로운 문제-솔루션 쌍을 생성하고, 난이도 기반 필터링 후 동일 모델을 미세조정하여 모델 성능을 최대 24% 향상시켰습니다. 합성 데이터의 양, 질, 다양성이 모델 일반화에 미치는 영향을 분석하여, 난이도가 높은 고품질 데이터가 내부 분포(in-distribution) 성능 향상에 효과적임을 밝혔습니다. 또한 다양한 데이터는 내부 분포 성능에는 큰 영향을 미치지 않지만, 외부 분포(out-of-distribution) 일반화에는 도움이 됨을 보였습니다. 마지막으로, 합성 데이터 생성 문제에 대한 모델 및 데이터 스케일링 법칙을 확인하여, 이것이 하류 모델 일반화에 긍정적인 영향을 미침을 확인했습니다.

시사점, 한계점

•

시사점:

◦

단일 모델을 이용한 고품질, 다양한 합성 수학 문제 데이터 생성 방법 제시

◦

문제 해결률 기반 난이도 측정 및 필터링을 통한 모델 성능 향상 (최대 24%)

◦

합성 데이터의 양, 질, 다양성이 모델 일반화에 미치는 영향 분석 및 그 결과 제시 (고품질 데이터의 중요성 강조, 다양성의 OOD 일반화 향상 효과)

◦

합성 데이터 생성 문제에 대한 모델 및 데이터 스케일링 법칙 확인

•

한계점:

◦

현재 수학 문제에 국한된 접근 방식으로, 다른 유형의 문제에는 적용 가능성 검증 필요

◦

난이도 측정 지표로 해결률만 사용하여, 문제의 본질적인 난이도를 완벽하게 반영하지 못할 가능성 존재

◦

생성된 데이터의 품질과 다양성을 더욱 향상시킬 수 있는 추가 연구 필요

◦

대규모 데이터셋 생성에 따른 계산 비용 고려 필요

Made with Slashpage