/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
MoSEs: Uncertainty-Aware AI-Generated Text Detection via Mixture of Stylistics Experts with Conditional Thresholds
Avoidance Decoding for Diverse Multi-Branch Story Generation
HydroVision: Predicting Optically Active Parameters in Surface Water Using Computer Vision
HodgeFormer: Transformers for Learnable Operators on Triangular Meshes through Data-Driven Hodge Matrices
MSA2-Net: Utilizing Self-Adaptive Convolution Module to Extract Multi-Scale Information in Medical Image Segmentation
Q-Learning-Driven Adaptive Rewiring for Cooperative Control in Heterogeneous Networks
Spotlighter: Revisiting Prompt Tuning from a Representative Mining View
Multimodal Iterative RAG for Knowledge Visual Question Answering
Embodied AI: Emerging Risks and Opportunities for Policy Action
Meta-learning ecological priors from large language models explains human learning and decision making
Scaffold Diffusion: Sparse Multi-Category Voxel Structure Generation with Discrete Diffusion
Locus: Agentic Predicate Synthesis for Directed Fuzzing
Network-Level Prompt and Trait Leakage in Local Research Agents
The Information Dynamics of Generative Diffusion
Arbitrary Precision Printed Ternary Neural Networks with Holistic Evolutionary Approximation
Murakkab: Resource-Efficient Agentic Workflow Orchestration in Cloud Platforms
LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery
MoNaCo: More Natural and Complex Questions for Reasoning Across Dozens of Documents
STREAM (ChemBio): A Standard for Transparently Reporting Evaluations in AI Model Reports
BadPromptFL: A Novel Backdoor Threat to Prompt-based Federated Learning in Multimodal Models
Learning to Select MCP Algorithms: From Traditional ML to Dual-Channel GAT-MLP
MagicGUI: A Foundational Mobile GUI Agent with Scalable Data Pipeline and Reinforcement Fine-tuning
A DbC Inspired Neurosymbolic Layer for Trustworthy Agent Design
RoboMemory: A Brain-inspired Multi-memory Agentic Framework for Lifelong Learning in Physical Embodied Systems
LanternNet: A Hub-and-Spoke System to Seek and Suppress Spotted Lanternfly Populations
When and Where do Data Poisons Attack Textual Inversion?
Covering a Few Submodular Constraints and Applications
Rethinking Data Protection in the (Generative) Artificial Intelligence Era
LD-RPS: Zero-Shot Unified Image Restoration via Latent Diffusion Recurrent Posterior Sampling
GroundingDINO-US-SAM: Text-Prompted Multi-Organ Segmentation in Ultrasound with LoRA-Tuned Vision-Language Models
IndexTTS2: A Breakthrough in Emotionally Expressive and Duration-Controlled Auto-Regressive Zero-Shot Text-to-Speech
HERCULES: Hierarchical Embedding-based Recursive Clustering Using LLMs for Efficient Summarization
Multimodal Medical Image Binding via Shared Text Embeddings
Open-Set LiDAR Panoptic Segmentation Guided by Uncertainty-Aware Learning
Revisiting Clustering of Neural Bandits: Selective Reinitialization for Mitigating Loss of Plasticity
LLM Embedding-based Attribution (LEA): Quantifying Source Contributions to Generative Model's Response for Vulnerability Analysis
A theoretical framework for self-supervised contrastive learning for continuous dependent data
Securing AI Agents with Information-Flow Control
FastCache: Fast Caching for Diffusion Transformer Through Learnable Linear Approximation
Cog-TiPRO: Iterative Prompt Refinement with LLMs to Detect Cognitive Decline via Longitudinal Voice Assistant Commands
Unveil Multi-Picture Descriptions for Multilingual Mild Cognitive Impairment Detection via Contrastive Learning
NOVER: Incentive Training for Language Models via Verifier-Free Reinforcement Learning
When a Reinforcement Learning Agent Encounters Unknown Unknowns
Group-in-Group Policy Optimization for LLM Agent Training
Automated Parsing of Engineering Drawings for Structured Information Extraction Using a Fine-tuned Document Understanding Transformer
LawFlow: Collecting and Simulating Lawyers' Thought Processes on Business Formation Case Studies
On Developers' Self-Declaration of AI-Generated Code: An Analysis of Practices
WildFireCan-MMD: A Multimodal Dataset for Classification of User-Generated Content During Wildfires in Canada
Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond
HDVIO2.0: Wind and Disturbance Estimation with Hybrid Dynamics VIO
TruthLens: Visual Grounding for Universal DeepFake Reasoning
Impoola: The Power of Average Pooling for Image-Based Deep Reinforcement Learning
Efficiently Editing Mixture-of-Experts Models with Compressed Experts
Problem Solved? Information Extraction Design Space for Layout-Rich Documents using LLMs
Investigating a Model-Agnostic and Imputation-Free Approach for Irregularly-Sampled Multivariate Time-Series Modeling
Rapid Word Learning Through Meta In-Context Learning
FedP$^2$EFT: Federated Learning to Personalize PEFT for Multilingual LLMs
Predict, Cluster, Refine: A Joint Embedding Predictive Self-Supervised Framework for Graph Representation Learning
Survey on Hand Gesture Recognition from Visual Input
Attention-guided Self-reflection for Zero-shot Hallucination Detection in Large Language Models
RouteNet-Gauss: Hardware-Enhanced Network Modeling with Machine Learning
GalaxAlign: Mimicking Citizen Scientists' Multimodal Guidance for Galaxy Morphology Analysis
Soft-TransFormers for Continual Learning
Exploring Response Uncertainty in MLLMs: An Empirical Evaluation under Misleading Scenarios
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Domain Consistency Representation Learning for Lifelong Person Re-Identification
Aligning Machine and Human Visual Representations across Abstraction Levels
Towards Agentic AI on Particle Accelerators
Enhancing Natural Language Inference Performance with Knowledge Graph for COVID-19 Automated Fact-Checking in Indonesian Language
Mooncake: A KVCache-centric Disaggregated Architecture for LLM Serving
Banishing LLM Hallucinations Requires Rethinking Generalization
SampleAttention: Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
MF-OML: Online Mean-Field Reinforcement Learning with Occupation Measures for Large Population Games
Explainable Machine Learning-Based Security and Privacy Protection Framework for Internet of Medical Things Systems
From Metrics to Meaning: Time to Rethink Evaluation in Human-AI Collaborative Design
P2DT: Mitigating Forgetting in task-incremental Learning with progressive prompt Decision Transformer
Towards Agentic OS: An LLM Agent Framework for Linux Schedulers
CoreThink: A Symbolic Reasoning Layer to reason over Long Horizon Tasks with LLMs
ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care
L-MARS: Legal Multi-Agent Workflow with Orchestrated Reasoning and Agentic Search
AHELM: A Holistic Evaluation of Audio-Language Models
The Ramon Llull's Thinking Machine for Automated Ideation
Search-Based Credit Assignment for Offline Preference-Based Reinforcement Learning
KIRETT: Knowledge-Graph-Based Smart Treatment Assistant for Intelligent Rescue Operations
CoT-Self-Instruct: Building high-quality synthetic prompts for reasoning and non-reasoning tasks
Integrating Activity Predictions in Knowledge Graphs
Symbiotic Agents: A Novel Paradigm for Trustworthy AGI-driven Networks
ChordPrompt: Orchestrating Cross-Modal Prompt Synergy for Multi-Domain Incremental Learning in CLIP
Deep Research Agents: A Systematic Examination And Roadmap
Gradients: When Markets Meet Fine-tuning - A Distributed Approach to Model Optimisation
ORMind: A Cognitive-Inspired End-to-End Reasoning Framework for Operations Research
Shutdownable Agents through POST-Agency
CyberBOT: Towards Reliable Cybersecurity Education via Ontology-Grounded Retrieval Augmented Generation
PadChest-GR: A Bilingual Chest X-ray Dataset for Grounded Radiology Report Generation
Can Large Language Models Act as Ensembler for Multi-GNNs?
MorphAgent: Empowering Agents through Self-Evolving Profiles and Decentralized Collaboration
Frugal inference for control
On Generating Monolithic and Model Reconciling Explanations in Probabilistic Scenarios
A Survey on Human-AI Collaboration with Large Foundation Models
JARVIS: A Neuro-Symbolic Commonsense Reasoning Framework for Conversational Embodied Agents
Load more
Large Language Models for Automated Literature Review: An Evaluation of Reference Generation, Abstract Writing, and Review Composition
Created by
Haebom
作者
Xuemei Tang, Xufeng Duan, Zhenguang G. Cai
概要
本論文は、大規模言語モデル(LLM)を用いた文献レビューの自動化の可能性と限界を探る。 LLMが文献の収集、構成、要約などの文献レビューの作成プロセスを自動化する可能性がありますが、包括的で信頼できる文献レビューを自動化するのにどれだけ効果的であるかはまだ不明です。この研究は、文献参考資料の作成、文献の要約、文献レビューの作成の3つの重要な課題でLLMの性能を自動的に評価するフレームワークを提示する。生成された参考資料の幻覚率を評価し、文献の要約および作成の意味的範囲と事実的一貫性を人が作成したものと比較して測定する多次元評価指標を導入する。実験の結果、最新モデルでさえ最近の発展にもかかわらず、幻覚参照資料を生成することが示された。また、文献検討作成において、様々なモデルの性能が学問分野によって異なることを確認した。
Takeaways、Limitations
•
Takeaways:
LLMを用いた文献レビューの自動化の可能性と限界を客観的に評価するフレームワークと評価指標を提示した。 LLMの性能が学問分野によって異なることを明らかにすることにより、分野別特性を考慮したモデル開発の必要性を提示した。
•
Limitations:
最新のLLMでさえ、サイケデリックな参考資料を作成する問題を特定しました。 LLMによる文献レビューの自動化の信頼性を向上させるためのさらなる研究開発が必要であることを示唆している。提示されたフレームワークと評価指標の一般化の可能性に関するさらなる研究が必要です。
PDFを見る
Made with Slashpage