Daily Arxiv

世界中で発行される人工知能関連の論文をまとめるページです。
このページはGoogle Geminiを活用して要約し、非営利で運営しています。
論文の著作権は著者および関連機関にあり、共有する際は出典を明記してください。

VarCoNet: A variability-aware self-supervised framework for functional connectome extraction from resting-state fMRI

KAIROS: Unified Training for Universal Non-Autoregressive Time Series Forecasting

SingMOS-Pro: An Comprehensive Benchmark for Singing Quality Assessment

Pack and Force Your Memory: Long-form and Consistent Video Generation

Understanding Adversarial Transfer: Why Representation-Space Attacks Fail Where Data-Space Attacks Succeed

GPT and Prejudice: A Sparse Approach to Understanding Learned Representations in Large Language Models

Analyzing Latent Concepts in Code Language Models

Less is More: Lean yet Powerful Vision-Language Model for Autonomous Driving

DM-Bench: Benchmarking LLMs for Personalized Decision Making in Diabetes Management

YOLO-Based Defect Detection for Metal Sheets

Jina-reranker-v3: Last but Not Late Interaction for Listwise Document Reranking

SecInfer: Preventing Prompt Injection via Inference-time Scaling

Putnam-like dataset summary: LLMs as mathematical competition contestants

Causal-Adapter: Taming Text-to-Image Diffusion for Faithful Counterfactual Generation

Enhancing LLM Steering through Sparse Autoencoder-Based Vector Refinement

Observation-Free Attacks on Online Learning to Rank

MTRec: Learning to Align with User Preferences via Mental Reward Models

MobiLLM: An Agentic AI Framework for Closed-Loop Threat Mitigation in 6G Open RANs

When Long Helps Short: How Context Length in Supervised Fine-tuning Affects Behavior of Large Language Models

Flow-Induced Diagonal Gaussian Processes

Towards Size-invariant Salient Object Detection: A Generic Evaluation and Optimization Approach

Dual-Stage Reweighted MoE for Long-Tailed Egocentric Mistake Detection

Robust Pan-Cancer Mitotic Figure Detection with YOLOv12

Scam2Prompt: A Scalable Framework for Auditing Malicious Scam Endpoints in Production LLMs

Better by Comparison: Retrieval-Augmented Contrastive Reasoning for Automatic Prompt Optimization

STORI: A Benchmark and Taxonomy for Stochastic Environments

A Study on the Framework for Evaluating the Ethics and Trustworthiness of Generative AI

Grounding the Ungrounded: A Spectral-Graph Framework for Quantifying Hallucinations in multimodal LLMs

FinAgentBench: A Benchmark Dataset for Agentic Retrieval in Financial Question Answering

RelayFormer: A Unified Local-Global Attention Framework for Scalable Image and Video Manipulation Localization

Quantum-RAG and PunGPT2: Advancing Low-Resource Language Generation and Retrieval for the Punjabi Language

Tuning LLM-based Code Optimization via Meta-Prompting: An Industrial Perspective

SBP-YOLO:A Lightweight Real-Time Model for Detecting Speed Bumps and Potholes toward Intelligent Vehicle Suspension Systems

An Architecture for Spatial Networking

A Comprehensive Review on Harnessing Large Language Models to Overcome Recommender System Challenges

First Hallucination Tokens Are Different from Conditional Ones

Rubrics as Rewards: Reinforcement Learning Beyond Verifiable Domains

Model Parallelism With Subnetwork Data Parallelism

VOTE: Vision-Language-Action Optimization with Trajectory Ensemble Voting

A Survey of Pun Generation: Datasets, Evaluations and Methodologies

Controlled Generation with Equivariant Variational Flow Matching

CAST: Enhancing Code Retrieval-Augmented Generation with Structural Chunking via Abstract Syntax Tree

DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation

SP-VLA: A Joint Model Scheduling and Token Pruning Approach for VLA Model Acceleration

Semantic Preprocessing for LLM-based Malware Analysis

Manipulating 3D Molecules in a Fixed-Dimensional E(3)-Equivariant Latent Space

Permissioned LLMs: Enforcing Access Control in Large Language Models

Efficient Preimage Approximation for Neural Network Certification

JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models

NeSyGeo: A Neuro-Symbolic Framework for Multimodal Geometric Reasoning Data Generation

Leveraging Online Data to Enhance Medical Knowledge in a Small Persian Language Model

Pre-training Limited Memory Language Models with Internal and External Knowledge

OT Score: An OT based Confidence Score for Source Free Unsupervised Domain Adaptation

Comparing Exploration-Exploitation Strategies of LLMs and Humans: Insights from Standard Multi-armed Bandit Experiments

A Survey of Deep Learning for Complex Speech Spectrograms

Continuous Thought Machines

CostFilter-AD: Enhancing Anomaly Detection through Matching Cost Filtering

XBreaking: Explainable Artificial Intelligence for Jailbreaking LLMs

AlignDiT: Multimodal Aligned Diffusion Transformer for Synchronized Speech Generation

PropRAG: Guiding Retrieval with Beam Search over Proposition Paths

Activated LoRA: Fine-tuned LLMs for Intrinsics

Not a nuisance but a useful heuristic: Outlier dimensions favor frequent tokens in language models

Verbosity Tradeoffs and the Impact of Scale on the Faithfulness of LLM Self-Explanations

Towards Quantifying Long-Range Interactions in Graph Machine Learning: a Large Graph Dataset and a Measurement

DatawiseAgent: A Notebook-Centric LLM Agent Framework for Adaptive and Robust Data Science Automation

A Multi-Fidelity Control Variate Approach for Policy Gradient Estimation

L1: Controlling How Long A Reasoning Model Thinks With Reinforcement Learning

Rethinking the Vulnerability of Concept Erasure and a New Method

Towards Economical Inference: Enabling DeepSeek's Multi-Head Latent Attention in Any Transformer-based LLMs

Primus: A Pioneering Collection of Open-Source Datasets for Cybersecurity LLM トレーニング

MarketSenseAI 2.0: Enhancing Stock Analysis through LLM Agents

CBVLM: Training-free Explainable Concept-based Large Vision Language Models for Medical Image Classification

Graph Neural Networks for Transmission Grid Topology Control: Busbar Information Asymmetry and Heterogeneous Representations

Inferring Pluggable Types with Machine Learning

Optimizing Container Loading and Unloading through Dual-Cycling and Dockyard Rehandle Reduction Using a Hybrid Genetic Algorithm

LLAMAFUZZ: Large Language Model Enhanced Greybox Fuzzing

Mutual Information Guided Backdoor Mitigation for Pre-trained Encoders

RACCooN: A Versatile Instructional Video Editing Framework with Auto-Generated Narratives

Unified Domain Adaptive Semantic Segmentation

Do AI Models Perform Human-like Abstract Reasoning Across Modalities?

Learning to Decide with Just Enough: Information-Theoretic Context Summarization for CMDPs

Thinkquel: A Model Dedicated to Text-to-dbt Using Synthetic Data and a Span-Aware Objective

OffTopicEval: When Large Language Models Enter the Wrong Chat, Almost Always!

Learning to Interact in World Latent for Team Coordination

Understanding Generative Recommendation with Semantic IDs from a Model-scaling View

GUI-PRA: Process Reward Agent for GUI Tasks

PRIME: Planning and Retrieval-Integrated Memory for Enhanced Reasoning

Efficient & Correct Predictive Equivalence for Decision Trees

THOR: Tool-Integrated Hierarchical Optimization via RL for Mathematical Reasoning

Gala: Global LLM Agents for Text-to-Model Translation

Disentangling Multiplex Spatial-Temporal Transition Graph Representation Learning for Socially Enhanced POI Recommendation

LayerCake: Token-Aware Contrastive Decoding within Large Language Model Layers

Bridging Ethical Principles and Algorithmic Methods: An Alternative Approach for Assessing Trustworthiness in AI Systems

V2X-UniPool: Unifying Multimodal Perception and Knowledge Reasoning for Autonomous Driving

MIRROR: Modular Internal Processing for Personalized Safety in LLM Dialogue

SelfBudgeter: Adaptive Token Allocation for Efficient LLM Reasoning

Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning

ViLBias: Detecting and Reasoning about Bias in Multimodal Content

OML: A Primitive for Reconciling Open Access with Owner Control in AI Model Distribution

Improved Monte Carlo Planning via Causal Disentanglement for Structurally-Decomposed Markov Decision Processes

Thinkquel: A Model Dedicated to Text-to-dbt Using Synthetic Data and a Span-Aware Objective

Created by

Haebom

作者

Anni Li, Aria Attar, Paul Dong

概要

自然言語の要求を信頼性が高く実稼働可能なデータ変換に変換することは依然として難しい課題です。正確性は正確なスキーマ接続と倉庫固有のSQL方言に依存し、トレーニング中に使用できる最も強力な監督（実行成功と結果一致）はシーケンスレベルでのみ提供されます。同時に、大規模で実績のあるコーパスを組み立てるのは費用がかかり、トークンレベルの目標はこれらのグローバル信号と一致しないため、不安定な最適化と制限された移植性をもたらします。 Thinkquelは、堅牢で移植可能で実行検証済みのデータベースクエリを生成するために微調整されたモデルです。 Thinkquelの方法論は、DBTを移植可能な中間表現として活用する新しい合成データパイプラインTS-SQLとLLMを微調整する際に、トークンレベルのトレーニング信号とシーケンスレベルの実行補償とのギャップを解消するように特別に設計されたSpan-Aware Reinforcement Learning目標、TS-GRPO(Token-Sequence GRPO)を統合します. 500例のTS-SQLテストセットでは、Thinkquel（32B）は2段階のSFTカリキュラムで93.2％の実行成功率と61.8％の正確な結果一致を達成し、基本モデルより67.2％（実行）および44.4％（一致）向上しました。 Spider(14B)実験では、TS-GRPOはGRPOおよびGSPOと比較して実行マッチング報酬の訓練安定性を高め、収束を高速化した。

Takeaways、Limitations

•

Takeaways：

◦

Thinkquelは、自然言語要求を実行可能なデータベースクエリに変換する問題に対する新しいアプローチを提示します。

◦

TS-SQLやTS-GRPOなどの革新的な方法論により、モデルの精度と安定性が向上しました。

◦

実験結果は,Thinkquelが従来モデルより優れた性能を示したことを示した。

◦

Spiderデータセットでもトレーニングの安定性と収束速度を向上させました。

•

Limitations：

◦

モデルのパフォーマンスは、データベーススキーマとSQL方言に依存する可能性があります。

◦

大規模実行検証済みコーパス構築のコスト問題

◦

モデルの移植性に関するさらなる研究が必要です。

Made with Slashpage