Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

RotBench: Evaluating Multimodal Large Language Models on Identifying Image Rotation

Input Time Scaling

CRED-SQL: Enhancing Real-world Large Scale Database Text-to-SQL Parsing through Cluster Retrieval and Execution Description

STEM: Efficient Relative Capability Evaluation of LLMs through Structured Transition Samples

AdaRing: Towards Ultra-Light Vision-Language Adaptation via Cross-Layer Tensor Ring Decomposition

Biased AI improves human decision-making but reduces trust

MAViS: A Multi-Agent Framework for Long-Sequence Video Storytelling

ExpVG: Investigating the Design Space of Visual Grounding in Multimodal Large Language Model

MetAdv: A Unified and Interactive Adversarial Testing Platform for Autonomous Driving

ETA: Energy-based Test-time Adaptation for Depth Completion

Extending Foundational Monocular Depth Estimators to Fisheye Cameras with Calibration Tokens

When Good Sounds Go Adversarial: Jailbreaking Audio-Language Models with Benign Inputs

CRINN: Contrastive Reinforcement Learning for Approximate Nearest Neighbor Search

Reinitializing weights vs units for maintaining plasticity in neural networks

Each to Their Own: Exploring the Optimal Embedding in RAG

Feature Distillation is the Better Choice for Model-Heterogeneous Federated Learning

TolerantECG: A Foundation Model for Imperfect Electrocardiogram

DeepRetro: Retrosynthetic Pathway Discovery using Iterative LLM Reasoning

LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization

Structure As Search: Unsupervised Permutation Learning for Combinatorial Optimization

Enhancing Temporal Sensitivity of Large Language Model for Recommendation with Counterfactual Tuning

Multi-agent Auditory Scene Analysis

MinD: Learning A Dual-System World Model for Real-Time Planning and Implicit Risk Analysis

AtmosMJ: Revisiting Gating Mechanism for AI Weather Forecasting Beyond the Year Scale

Benchmarking Pre-Trained Time Series Models for Electricity Price Forecasting

Spore in the Wild: A Case Study of Spore.fun as an Open-Environment Evolution Experiment with Sovereign AI Agents on TEE-Secured Blockchains

Critique-GRPO: Advancing LLM Reasoning with Natural Language and Numerical Feedback

Neural Restoration of Greening Defects in Historical Autochrome Photographs Based on Purely Synthetic Data

Security Concerns for Large Language Models: A Survey

Common Data Format (CDF): A Standardized Format for Match-Data in Football (Soccer)

One-Layer Transformers are Provably Optimal for In-context Reasoning and Distributional Association Learning in Next-Token Prediction Tasks

FMSD-TTS: Few-shot Multi-Speaker Multi-Dialect Text-to-Speech Synthesis for \"U-Tsang, Amdo and Kham Speech Dataset Generation

Uncertainty Quantification for Language Models: A Suite of Black-Box, White-Box, LLM Judge, and Ensemble Scorers

Hallucinations and Key Information Extraction in Medical Texts: A Comprehensive Assessment of Open-Source Large Language Models

A Conceptual Framework for AI-based Decision Systems in Critical Infrastructures

Dominated Actions in Imperfect-Information Games

Hands-On: Segmenting Individual Signs from Continuous Sequences

PathGPT: Reframing Path Recommendation as a Natural Language Generation Task with Retrieval-Augmented Language Models

Boosting Chart-to-Code Generation in MLLM via Dual Preference-Guided Refinement

JudgeLRM: Large Reasoning Models as a Judge

Generative AI in K-12 Education: The CyberScholar Initiative

Natural Language Generation from Visual Events: State-of-the-Art and Key Open Questions

Hybrid Action Based Reinforcement Learning for Multi-Objective Compatible Autonomous Driving

Is Contrastive Distillation Enough for Learning Comprehensive 3D Representations?

Action Engine: Automatic Workflow Generation in FaaS

The importance of visual modeling languages in generative software engineering

Identity Preserving 3D Head Stylization with Multiview Score Distillation

SLED: Self Logits Evolution Decoding for Improving Factuality in Large Language Models

Testing Components of the Attention Schema Theory in Artificial Neural Networks

A Little Human Data Goes A Long Way

Source2Synth: Synthetic Data Generation and Curation Grounded in Real Data Sources

Social Debiasing for Fair Multi-modal LLMs

A Comprehensive Benchmark on Spectral GNNs: The Impact on Efficiency, Memory, and Effectiveness

LoRA-XS: Low-Rank Adaptation with Extremely Small Number of Parameters

Enhancing Depression-Diagnosis-Oriented Chat with Psychological State Tracking

Estimation of Energy-dissipation Lower-bounds for Neuromorphic Learning-in-memory

Don't Push the Button! Exploring Data Leakage Risks in Machine Learning and Transfer Learning

Towards the Use of Saliency Maps for Explaining Low-Quality Electrocardiograms to End Users

Nash Convergence of Mean-Based Learning Algorithms in First-Price Auctions

TASER: Table Agents for Schema-guided Extraction and Recommendation

Modeling Relational Logic Circuits for And-Inverter Graph Convolutional Network

EvoCurr: Self-evolving Curriculum with Behavior Code Generation for Complex Decision-making

KIRETT: Knowledge-Graph-Based Smart Treatment Assistant for Intelligent Rescue Operations

EoH-S: Evolution of Heuristic Set using LLMs for Automated Heuristic Design

SE-Agent: Self-Evolution Trajectory Optimization in Multi-Step Reasoning with LLM-Based Agents

Benchmarking Vector, Graph and Hybrid Retrieval Augmented Generation (RAG) Pipelines for Open Radio Access Networks (ORAN)

The NordDRG AI Benchmark for Large Language Models

Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs

Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for Mathematical Problem Solving

Unsupervised Learning for Quadratic Assignment

Reference-Aligned Retrieval-Augmented Question Answering over Heterogeneous Proprietary Documents

Benchmarking graph construction by large language models for coherence-driven inference

Quantization Meets dLLMs: A Systematic Study of Post-training Quantization for Diffusion LLMs

Graph Structure Learning with Temporal Graph Information Bottleneck for Inductive Representation Learning

$TIME[t] \subseteq SPACE[O(\sqrt{t})]$ via Tree Height Compression

Long Chain-of-Thought Reasoning Across Languages

From Passive Tool to Socio-cognitive Teammate: A Conceptual Framework for Agentic AI in Human-AI Collaborative Learning

Evaluating Retrieval-Augmented Generation vs. Long-Context Input for Clinical Reasoning over EHRs

TransLight: Image-Guided Customized Lighting Control with Generative Decoupling

DINOv3 with Test-Time Training for Medical Image Registration

MF-LPR$^2$: Multi-Frame License Plate Image Restoration and Recognition using Optical Flow

TransLLM: A Unified Multi-Task Foundation Framework for Urban Transportation via Learnable Prompting

PepThink-R1: LLM for Interpretable Cyclic Peptide Optimization with CoT SFT and Reinforcement Learning

Reliable generation of isomorphic physics problems using ChatGPT with prompt-chaining and tool use

Cross-Modality Controlled Molecule Generation with Diffusion Language Model

Evaluating Multilingual and Code-Switched Alignment in LLMs via Synthetic Natural Language Inference

AFABench: A Generic Framework for Benchmarking Active Feature Acquisition

Emerson-Lei and Manna-Pnueli Games for LTLf+ and PPLTL+ Synthesis

Transplant Then Regenerate: A New Paradigm for Text Data Augmentation

ShizhenGPT: Towards Multimodal LLMs for Traditional Chinese Medicine

Learning in Repeated Multi-Objective Stackelberg Games with Payoff Manipulation

Foe for Fraud: Transferable Adversarial Attacks in Credit Card Fraud Detection

ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signal

ELATE: Evolutionary Language model for Automated Time-series Engineering

OneLoc: Geo-Aware Generative Recommender Systems for Local Life Service

Can LLM Agents Solve Collaborative Tasks? A Study on Urgency-Aware Planning and Coordination

A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References

UST-SSM: Unified Spatio-Temporal State Space Models for Point Cloud Video Modeling

An Open-Source HW-SW Co-Development Framework Enabling Efficient Multi-Accelerator Systems

Mamba2 Meets Silence: Robust Vocal Source Separation for Sparse Regions

TASER: Table Agents for Schema-guided Extraction and Recommendation

Created by

Haebom

Author

Nicole Cho, Kirsty Fielding, William Watson, Sumitra Ganesh, Manuela Veloso

Outline

This paper proposes TASER (Table Agents for Schema-guided Extraction and Recommendation), an agent-based system for extracting unstructured, multi-page table data from real-world financial documents. TASER transforms unstructured tables into regularized, schema-compliant output by utilizing agents that perform table detection, classification, extraction, and schema modification suggestions. Specifically, TASER incorporates schema improvements through continuous learning, emphasizes the effectiveness of large-scale batch learning, and achieves 10.1% performance improvement over existing models such as Table Transformer. Furthermore, we present a novel financial table dataset, TASERTab, which comprises 22,584 pages (28,150,449 tokens), 3,213 tables, and a total of $731,685,511,687 worth of asset data.

Takeaways, Limitations

•

Takeaways:

◦

Provides an effective solution to the problem of extracting complex and unstructured table data from real-world financial documents.

◦

Proving the Effectiveness of an Agent-Based, Schema-Guided Extraction System

◦

Emphasize the importance of performance improvement and schema improvement through continuous learning.

◦

Enabling research by releasing a large-scale dataset, TASERTab, including real-world financial data.

◦

10.1% performance improvement over Table Transformer

◦

Improved schema recommendations and increased asset extraction through large-scale batch learning (9.8%).

•

Limitations:

◦

Currently available information is insufficient to provide a detailed description of the specific architecture and algorithms of the TASER system.

◦

Further analysis of the quality and bias of the TASERTab dataset is needed.

◦

Generalization performance evaluation is required for various types of financial documents and table structures.

◦

Lack of comparative analysis with other agent-based systems.

View PDF

Made with Slashpage