haebom
Daily Arxiv
전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.
BrowseConf: Confidence-Guided Test-Time Scaling for Web Agents
FRBNet: Revisiting Low-Light Vision through Frequency-Domain Radial Basis Network
Eigen-Value: Efficient Domain-Robust Data Valuation via Eigenvalue-Based Approach
Robust Uncertainty Quantification for Self-Evolving Large Language Models via Continual Domain Pretraining
HyPerNav: Hybrid Perception for Object-Oriented Navigation in Unknown Environment
TraceTrans: Translation and Spatial Tracing for Surgical Prediction
GRAID: Enhancing Spatial Reasoning of VLMs Through High-Fidelity Data Generation
CustomIR: Unsupervised Fine-Tuning of Dense Embeddings for Known Document Corpora
Your Dense Retriever is Secretly an Expeditious Reasoner
FieldGen: From Teleoperated Pre-Manipulation Trajectories to Field-Guided Data Generation
Context-level Language Modeling by Learning Predictive Context Embeddings
Detecting Latin in Historical Books with Large Language Models: A Multimodal Benchmark
MENTOR: A Reinforcement Learning Framework for Enabling Tool Use in Small Models via Teacher-Optimized Rewards
MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems
SimpleVSF: VLM-Scoring Fusion for Trajectory Prediction of End-to-End Autonomous Driving
The Formalism-Implementation Gap in Reinforcement Learning Research
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
TokenTiming: A Dynamic Alignment Method for Universal Speculative Decoding Model Pairs
Cross-Scenario Unified Modeling of User Interests at Billion Scale
DPRF: A Generalizable Dynamic Persona Refinement Framework for Optimizing Behavior Alignment Between Personalized LLM Role-Playing Agents and Humans
Think Just Enough: Sequence-Level Entropy as a Confidence Signal for LLM Reasoning
SEER: The Span-based Emotion Evidence Retrieval Benchmark
Distilled Protein Backbone Generation
Untargeted Jailbreak Attack
AdaDetectGPT: Adaptive Detection of LLM-Generated Text with Statistical Guarantees
Prosperity before Collapse: How Far Can Off-Policy RL Reach with Stale Data on LLMs?
On Robustness of Vision-Language-Action Model against Multi-Modal Perturbations
PEARL: Peer-Enhanced Adaptive Radio via On-Device LLM
Seeing Symbols, Missing Cultures: Probing Vision-Language Models' Reasoning on Fire Imagery and Cultural Meaning
ImageNet-trained CNNs are not biased towards texture: Revisiting feature reliance through controlled suppression
PTQTP: Post-Training Quantization to Trit-Planes for Large Language Models
The human-machine paradox: how collaboration creates or destroys value, and why augmentation is key to resolving it
Reproducible workflow for online AI in digital health
Pre-trained knowledge elevates large language models beyond traditional chemical reaction optimizers
MolErr2Fix: Benchmarking LLM Trustworthiness in Chemistry via Modular Error Detection, Localization, Explanation, and Revision
Robustness is Important: Limitations of LLMs for Data Fitting
DP-LLM: Runtime Model Adaptation with Dynamic Layer-wise Precision Assignment
FoGE: Fock Space inspired encoding for graph prompting
PULSE: Practical Evaluation Scenarios for Large Multimodal Model Unlearning
GEMeX-RMCoT: An Enhanced Med-VQA Dataset for Region-Aware Multimodal Chain-of-Thought Reasoning
Thermometry of simulated Bose--Einstein condensates using machine learning
LittleBit: Ultra Low-Bit Quantization via Latent Factorization
BNMusic: Blending Environmental Noises into Personalized Music
Evaluating AI-Powered Learning Assistants in Engineering Higher Education: Student Engagement, Ethical Challenges, and Policy Implications
Mixture-of-Experts Meets In-Context Reinforcement Learning
Improving Data Efficiency for LLM Reinforcement Fine-tuning Through Difficulty-targeted Online Data Selection and Rollout Replay
NOBLE -- Neural Operator with Biologically-informed Latent Embeddings to Capture Experimental Variability in Biological Neuron Models
Data Leakage and Deceptive Performance: A Critical Examination of Credit Card Fraud Detection Methodologies
REASONING COMPILER: LLM-Guided Optimizations for Efficient Model Serving
PVP: An Image Dataset for Personalized Visual Persuasion with Persuasion Strategies, Viewer Characteristics, and Persuasiveness Ratings
FALCON: An ML Framework for Fully Automated Layout-Constrained Analog Circuit Design
OmniResponse: Online Multimodal Conversational Response Generation in Dyadic Interactions
GraSS: Scalable Data Attribution with Gradient Sparsification and Sparse Projection
MixAT: Combining Continuous and Discrete Adversarial Training for LLMs
STree: Speculative Tree Decoding for Hybrid State-Space Models
Do Language Models Use Their Depth Efficiently?
A Generalized Label Shift Perspective for Cross-Domain Gaze Estimation
The Logical Expressiveness of Temporal GNNs via Two-Dimensional Product Logics
Group-in-Group Policy Optimization for LLM Agent Training
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Offline Learning and Forgetting for Reasoning with Large Language Models
Multimodal 3D Genome Pre-training
Boosting Omnidirectional Stereo Matching with a Pre-trained Depth Foundation Model
Mirror Descent and Novel Exponentiated Gradient Algorithms Using Trace-Form Entropies and Deformed Logarithms
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Unified Approach for Elevating Benchmark Quality
Generalized Exponentiated Gradient Algorithms Using the Euler Two-Parameter Logarithm
FragFM: Hierarchical Framework for Efficient Molecule Generation via Fragment-Level Discrete Flow Matching
ADMN: A Layer-Wise Adaptive Multimodal Network for Dynamic Input Noise and Compute Resources
A High-Dimensional Statistical Method for Optimizing Transfer Quantities in Multi-Source Transfer Learning
$\beta$-DQN: Improving Deep Q-Learning By Evolving the Behavior
Provable Scaling Laws for the Test-Time Compute of Large Language Models
Learned, Lagged, LLM-splained: LLM Responses to End User Security Questions
One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models
TrajAgent: An LLM-Agent Framework for Trajectory Modeling via Large-and-Small Model Collaboration
GRS: Generating Robotic Simulation Tasks from Real-World Images
Navigation with VLM framework: Towards Going to Any Language
Retrieval-Augmented Generation-based Relation Extraction
Diffusion Models Meet Contextual Bandits
Querying Inconsistent Prioritized Data with ORBITS: Algorithms, Implementation, and Experiments
Multi-Agent Evolve: LLM Self-Improve through Co-evolution
ReCode: Unify Plan and Action for Universal Granularity Control
Human-Like Goalkeeping in a Realistic Football Simulation: a Sample-Efficient Reinforcement Learning Approach
From Prompt Optimization to Multi-Dimensional Credibility Evaluation: Enhancing Trustworthiness of Chinese LLM-Generated Liver MRI Reports
Huxley-G\"odel Machine: Human-Level Coding Agent Development by an Approximation of the Optimal Self-Improving Machine
Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles
PanicToCalm: A Proactive Counseling Agent for Panic Attacks
A Comprehensive Survey on Reinforcement Learning-based Agentic Search: Foundations, Roles, Optimizations, Evaluations, and Applications
Co-TAP: Three-Layer Agent Interaction Protocol Technical Report
Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research
MathBode: Understanding LLM Reasoning with Dynamical Systems
Is It Certainly a Deepfake? Reliability Analysis in Detection & Generation Ecosystem
Accelerate Scaling of LLM Finetuning via Quantifying the Coverage and Depth of Instruction Set
Freeze and Conquer: Reusable Ansatz for Solving the Traveling Salesman Problem
A Neuroscience-Inspired Dual-Process Model of Compositional Generalization
Memory Mosaics at scale
The Confidence Paradox: Can LLM Know When It's Wrong
VIRAL: Vision-grounded Integration for Reward design And Learning
Partner Modelling Emerges in Recurrent Agents (But Only When It Matters)
Multimodal Dreaming: A Global Workspace Approach to World Model-Based Reinforcement Learning
TableTime: Reformulating Time Series Classification as Training-Free Table Understanding with Large Language Models
Load more
Understanding AI Trustworthiness: A Scoping Review of AIES & FAccT Articles
Created by
Haebom
저자
Siddharth Mehrotra, Jin Huang, Xuelong Fu, Roel Dobbe, Clara I. S
anchez, Maarten de Rijke
개요
AIES와 FAccT 커뮤니티가 AI 신뢰성을 어떻게 개념화하고 측정하며 검증하는지 검토하는 것을 목표로 하는 범위 검토 연구입니다. 현재 연구가 기술적 속성에 집중하여 사회 기술적 측면을 간과하는 주요 격차를 발견했습니다.
시사점, 한계점
•
기술적 정밀성을 강조하고 사회적, 윤리적 고려 사항을 소홀히 하는 경향이 있습니다.
•
AI 시스템의 사회 기술적 특성이 충분히 탐구되지 않고 있습니다.
•
AI 신뢰성은 정의하는 권한을 가진 사람들에 의해 형성되는 논쟁적인 개념으로 나타납니다.
•
AI 신뢰성을 발전시키기 위해서는 기술적 엄격함과 사회적, 문화적, 제도적 고려 사항을 결합하는 학제 간 접근 방식이 필요합니다.
PDF 보기
Made with Slashpage