/
/
Daily Arxiv
Daily Arxiv
世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。
VarCoNet: A variability-aware self-supervised framework for functional connectome extraction from resting-state fMRI
KAIROS: Unified Training for Universal Non-Autoregressive Time Series Forecasting
SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment
Pack and Force Your Memory: Long-form and Consistent Video Generation
Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed
GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models
Analyzing Latent Concepts in Code Language Models
Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving
DM-Bench: Benchmarking LLMs for Personalized Decision Making in Diabetes Management
YOLO-Based Defect Detection for Metal Sheets
Jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking
SecInfer: Preventing Prompt Injection via Inference-time Scaling
Putnam-like dataset summary: LLMs as mathematical competition contestants
Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation
Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement
Observation-Free Attacks on Online Learning to Rank
MTRec: Learning to Align with User Preferences via Mental Reward Models
MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANs
When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models
Flow-Induced Diagonal Gaussian Processes
Towards Size-invariant Salient Object Detection: A Generic Evaluation and Optimization Approach
Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection
Robust Pan-Cancer Mitotic Figure Detection with YOLOv12
Scam2Prompt: A Scalable Framework for Auditing Malicious Scam Endpoints in Production LLMs
Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization
STORI: A Benchmark and Taxonomy for Stochastic Environments
A Study on the Framework for Evaluating the Ethics and Trustworthiness of Generative AI
Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs
FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering
RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization
Quantum-RAG and PunGPT2: Advancing Low-Resource Language Generation and Retrieval for the Punjabi Language
Tuning LLM-based Code Optimization via Meta-Prompting: An Industrial Perspective
SBP-YOLO:A Lightweight Real-Time Model for Detecting Speed Bumps and Potholes toward Intelligent Vehicle Suspension Systems
An Architecture for Spatial Networking
A Comprehensive Review on Harnessing Large Language Models to Overcome Recommender System Challenges
First Hallucination Tokens Are Different from Conditional Ones
Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains
Model Parallelism With Subnetwork Data Parallelism
VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting
A Survey of Pun Generation: Datasets, Evaluations and Methodologies
Controlled Generation with Equivariant Variational Flow Matching
CAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree
DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation
SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration
Semantic Preprocessing for LLM-based Malware Analysis
Manipulating 3D Molecules in a Fixed-Dimensional E(3)-Equivariant Latent Space
Permissioned LLMs: Enforcing Access Control in Large Language Models
Efficient Preimage Approximation for Neural Network Certification
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model
Pre-training Limited Memory Language Models with Internal and External Knowledge
OT Score: An OT based Confidence Score for Source Free Unsupervised Domain Adaptation
Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments
A Survey of Deep Learning for Complex Speech Spectrograms
Continuous Thought Machines
CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering
XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs
AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation
PropRAG: Guiding Retrieval with Beam Search over Proposition Paths
Activated LoRA: Fine-tuned LLMs for Intrinsics
Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models
Verbosity Tradeoffs and the Impact of Scale on the Faithfulness of LLM Self-Explanations
Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement
DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation
A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation
L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning
Rethinking the Vulnerability of Concept Erasure and a New Method
Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs
Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM トレーニング
MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents
CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification
Graph Neural Networks for Transmission Grid Topology Control: Busbar Information Asymmetry and Heterogeneous Representations
Inferring Pluggable Types with Machine Learning
Optimizing Container Loading and Unloading through Dual-Cycling and Dockyard Rehandle Reduction Using a Hybrid Genetic Algorithm
LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing
Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders
RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives
Unified Domain Adaptive Semantic Segmentation
Do AI Models Perform Human-like Abstract Reasoning Across Modalities?
Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs
Thinkquel: A Model Dedicated to Text-to-dbt Using Synthetic Data and a Span-Aware Objective
OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!
Learning to Interact in World Latent for Team Coordination
Understanding Generative Recommendation with Semantic IDs from a Model-scaling View
GUI-PRA: Process Reward Agent for GUI Tasks
PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning
Efficient & Correct Predictive Equivalence for Decision Trees
THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning
Gala: Global LLM Agents for Text-to-Model Translation
Disentangling Multiplex Spatial-Temporal Transition Graph Representation Learning for Socially Enhanced POI Recommendation
LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers
Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems
V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving
MIRROR: Modular Internal Processing for Personalized Safety in LLM Dialogue
SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning
Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning
ViLBias: Detecting and Reasoning about Bias in Multimodal Content
OML: A Primitive for Reconciling Open Access with Owner Control in AI Model Distribution
Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes
Load more
NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation
Created by
Haebom
作者
Weiming Wu, Jin Ye, Zi-kang Wang, Zhi Zhou, Yu-Feng Li, Lan-Zhe Guo
概要
マルチモーダル大規模言語モデル(MLLM)の幾何学的推論能力を向上させるために、大規模で高品質の推論データを取得することが重要であり、既存のデータ生成方法の限界を克服するための新しい神経記号フレームワークであるNeSyGeoを提案します。 NeSyGeoは、平面幾何学のすべての要素を包括的に表現するドメイン固有の言語を使用し、シンボリックシーケンスを合成して視覚的およびテキスト表現にマッピングし、逆方向検索と順方向検証によって推論パスを生成します。これに基づいて、100k個のサンプルを含むNeSyGeo CoTおよびNesyGeo-Captionデータセットを構築し、MLLMの幾何学的推論能力を評価するための新しいベンチマークであるNeSyGeo-Testをリリースしました。実験結果は,提案した方法が複数のMLLMsの性能を大幅に改善し,特に少数のサンプルと少数の訓練エポックによってかなりの性能向上を達成したことを示した。
Takeaways、Limitations
•
Takeaways:
◦
新しい神経記号フレームワークNeSyGeoを介して幾何学的推論データ生成の多様性と数値的一般化問題を解決しました。
◦
NeSyGeoフレームワークは、MLLMの幾何学的推論能力を向上させるのに有効であることを証明しました。
◦
少量のデータとトレーニングでもMLLMのパフォーマンスを大幅に向上させることができます。
◦
4Bモデルが8Bモデルよりも優れた性能を発揮します。
•
Limitations:
◦
論文に具体的なLimitationsは記載されていない。
PDFを見る
Made with Slashpage