/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
Benchmarking LLM Causal Reasoning with Scientifically Validated Relationships
TIGeR: Tool-Integrated Geometric Reasoning in Vision-Language Models for Robotics
Mining the Mind: What 100M Beliefs Reveal About Frontier LLM Knowledge
Open ASR Leaderboard: Towards Reproducible and Transparent Multilingual and Long-Form Speech Recognition Evaluation
Emotionally Vulnerable Subtype of Internet Gaming Disorder: Measuring and Exploring the Pathology of Problematic Generative AI Use
Explaining raw data complexity to improve satellite onboard processing
Foundations of LLM Knowledge Materialization: Termination, Reproducibility, Robustness
Incremental Summarization for Customer Support via Progressive Note-Taking and Agent Feedback
SDAR: A Synergistic Diffusion-AutoRegression Paradigm for Scalable Sequence Generation
A Multimodal GUI Architecture for Interfacing with LLM-Based Conversational Assistants
Distributional Semantics Tracing: A Framework for Explaining Hallucinations in Large Language Models
DACP: Domain-Adaptive Continual Pre-Training of Large Language Models for Phone Conversation Summarization
High-Fidelity Synthetic ECG Generation via Mel-Spectrogram Informed Diffusion Training
Provable Speech Attributes Conversion via Latent Independence
Chronological Thinking in Full-Duplex Spoken Dialogue Language Models
Paper2Video: Automatic Video Generation from Scientific Papers
MacroBench: A Novel Testbed for Web Automation Scripts via Large Language Models
Spatiotemporal Forecasting as Planning: A Model-Based Reinforcement Learning Approach with Generative World Models
Generalized Orders of Magnitude for Scalable, Parallel, High-Dynamic-Range Computation
LogAction: Consistent Cross-system Anomaly Detection through Logs via Active Domain Adaptation
Equilibrium Matching: Generative Modeling with Implicit Energy-Based Models
More Than One Teacher: Adaptive Multi-Guidance Policy Optimization for Diverse Exploration
Learning to Reason for Hallucination Span Detection
Panorama: Fast-Track Nearest Neighbors
Feature Identification via the Empirical NTK
OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction
Efficient and Transferable Agentic Knowledge Graph RAG via Reinforcement Learning
Real-time Noise Detection and Classification in Single-Channel EEG: A Lightweight Machine Learning Approach for EMG, White Noise, and EOG Artifacts
The Sandbox Configurator: A Framework to Support Technical Assessment in AI Regulatory Sandboxes
CORE-3D: Context-aware Open-vocabulary Retrieval by Embeddings in 3D
PARL-MT: Learning to Call Functions in Multi-Turn Conversation with Progress Awareness
Defending MoE LLMs against Harmful Fine-Tuning via Safety Routing Alignment
Learn the Ropes, Then Trust the Wins: Self-imitation with Progressive Exploration for Agentic Reinforcement Learning
InfiR2: A Comprehensive FP8 Training Recipe for Reasoning-Enhanced Language Models
MORPH: Shape-agnostic PDE Foundation Models
Spiffy: Multiplying Diffusion LLM Acceleration via Lossless Speculative Decoding
Evaluating LLM-Generated Versus Human-Authored Responses in Role-Play Dialogues
Audio-Conditioned Diffusion LLMs for ASR and Deliberation Processing
Hierarchical Reinforcement Learning with Low-Level MPC for Multi-Agent Control
ProtoMedX: Towards Explainable Multi-Modal Prototype Learning for Bone Health Classification
From Correction to Mastery: Reinforced Distillation of Large Language Model Agents
Reproducible workflow for online AI in digital health
HiChunk: Evaluating and Enhancing Retrieval-Augmented Generation with Hierarchical Chunking
FireGNN: Neuro-Symbolic Graph Neural Networks with Trainable Fuzzy Rules for Interpretable Medical Image Classification
TalkPlayData 2: An Agentic Synthetic Data Pipeline for Multimodal Conversational Music Recommendation
A Survey of Reinforcement Learning for Large Reasoning Models
X-Teaming Evolutionary M2S: Automated Discovery of Multi-turn to Single-turn Jailbreak Templates
Barycentric Neural Networks and Length-Weighted Persistent Entropy Loss: A Green Geometric and Topological Framework for Function Approximation
Scaling Performance of Large Language Model Pretraining
Towards Methane Detection Onboard Satellites
AEGIS: Automated Co-Evolutionary Framework for Guarding Prompt Injections Schema
Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
Safe-Control: A Safety Patch for Mitigating Unsafe Content in Text-to-Image Generation Models
Condition Weaving Meets Expert Modulation: Towards Universal and Controllable Image Generation
Long Chain-of-Thought Reasoning Across Languages
MAHL: Multi-Agent LLM-Guided Hierarchical Chiplet Design with Adaptive Debugging
MCPSecBench: A Systematic Security Benchmark and Playground for Testing Model Context Protocols
MInDI-3D: Iterative Deep Learning in 3D for Sparse-view Cone Beam Computed Tomography
MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling
CoCoA: Collaborative Chain-of-Agents for Parametric-Retrieved Knowledge Synergy
Can Small-Scale Data Poisoning Exacerbate Dialect-Linked Biases in Large Language Models?
Distilling a Small Utility-Based Passage Selector to Enhance Retrieval-Augmented Generation
From Feedback to Checklists: Grounded Evaluation of AI-Generated Clinical Notes
Controllable Hybrid Captioner for Improved Long-form Video Understanding
Leveraging Personalized PageRank and Higher-Order Topological Structures for Heterophily Mitigation in Graph Neural Networks
Understanding Teen Overreliance on AI Companion Chatbots Through Self-Reported Reddit Narratives
ERR@HRI 2.0 Challenge: Multimodal Detection of Errors and Failures in Human-Robot Conversations
Efficiency-Effectiveness Reranking FLOPs for LLM-based Rerankers
Truth, Trust, and Trouble: Medical AI on the Edge
LLMs on a Budget? Say HOLA
The Role of Model Confidence on Bias Effects in Measured Uncertainties for Vision-Language Models
A Survey of Foundation Models for IoT: Taxonomy and Criteria-Based Analysis
Breaking the Reviewer: Assessing the Vulnerability of Large Language Models in Automated Peer Review Under Textual Adversarial Attacks
Not All Clients Are Equal: Collaborative Model Personalization on Heterogeneous Multi-Modal Clients
Rethinking Losses for Diffusion Bridge Samplers
Think With Videos For Agentic Long-Video Understanding
ReasonMed: A 370K Multi-Agent Generated Dataset for Advancing Medical Reasoning
Intention-Conditioned Flow Occupancy Models
Product of Experts for Visual Generation
Modality-Balancing Preference Optimization of Large Multimodal Models by Adversarial Negative Mining
Dissecting Logical Reasoning in LLMs: A Fine-Grained Evaluation and Supervision Study
Tug-of-war between idioms' figurative and literal interpretations in LLMs
MAGREF: Masked Guidance for Any-Reference Video Generation with Subject Disentanglement
GL-PGENet: A Parameterized Generation Framework for Robust Document Image Enhancement
CAST: Contrastive Adaptation and Distillation for Semi-Supervised Instance Segmentation
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
The Shape of Adversarial Influence: Characterizing LLM Latent Spaces with Persistent Homology
BiomedSQL: Text-to-SQL for Scientific Reasoning on Biomedical Knowledge Bases
Inference-time Alignment in Continuous Space
STOPA: A Database of Systematic VariaTion Of DeePfake Audio for Open-Set Source Tracing and Attribution
Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty
Watch your steps: Dormant Adversarial Behaviors that Activate upon LLM Finetuning
LLINBO: Trustworthy LLM-in-the-Loop Bayesian Optimization
Logic Jailbreak: Efficiently Unlocking LLM Safety Restrictions Through Formal Logical Expression
FairSHAP: Preprocessing for Fairness Through Attribution-Based Data Augmentation
Hakim: Farsi Text Embedding Model
Understanding In-context Learning of Addition via Activation Subspaces
Evaluating Evaluation Metrics - The Mirage of Hallucination Detection
T-VEC: A Telecom-Specific Vectorization Model with Enhanced Semantic Understanding via Deep Triplet Loss Fine-Tuning
Hallucination Detection in LLMs with Topological Divergence on Attention Graphs
Load more
Trans-EnV: A Framework for Evaluating the Linguistic Robustness of LLMs Against English Varieties
Created by
Haebom
作者
Jiyoung Lee, Seungho Kim, Jieun Han, Jun-Min Lee, Kitaek Kim, Alice Oh, Edward Choi
概要
この論文は、大規模言語モデル(LLM)が標準的なアメリカ英語(SAE)で主に評価され、世界中の英語の変化の多様性を見落とすことを指摘しています。このような狭い焦点は、非標準の変化からパフォーマンスの低下につながり、世界中のユーザーに不平等な利益をもたらす可能性があるため、さまざまな非標準英語の変化に対するLLMの言語的堅牢性を広く評価することが重要であると強調しています。この目的のために、SAEデータセットをいくつかの英語のバリエーションに自動的に変換するフレームワークであるTrans-EnVを提示します。 Trans-EnVは、言語学の専門家の知識とLLMベースの変換を組み合わせて言語的妥当性と拡張性を保証し、6つのベンチマークデータセットを38の英語のバリエーションに変換し、7つの最新のLLMを評価します。研究は、非標準変異における最大46.3%の精度低下を示した。これは、様々な英語変異に対する包括的な言語的堅牢性評価の重要性を強調している。 Trans-EnVの各構成は、厳密な統計テストと第2言語習得分野の研究者との協議によって検証された。
Takeaways、Limitations
•
Takeaways:
◦
LLMの言語的堅牢性評価には、様々な英語の変化を含めるべきであることを強調する。
◦
Trans-EnVフレームワークは、自動化された方法でさまざまな英語のバリエーションの評価を実行できることを示唆しています。
◦
実験の結果,非標準英語変異でLLMの性能低下が発生することを証明し,問題の重大性を示した。
◦
公開されたコードとデータセットを通じて、その後の研究と発展のための基盤を提供します。
•
Limitations:
◦
論文に具体的なLimitations言及はありません。
PDFを見る
Made with Slashpage