Daily Arxiv

전 세계에서 발간되는 인공지능 관련 논문을 정리하는 페이지 입니다.
본 페이지는 Google Gemini를 활용해 요약 정리하며, 비영리로 운영 됩니다.
논문에 대한 저작권은 저자 및 해당 기관에 있으며, 공유 시 출처만 명기하면 됩니다.

OmniEVA: Embodied Versatile Planner via Task-Adaptive 3D-Grounded and Embodiment-aware Reasoning

Open-sci-ref-0.01: open and reproducible reference baselines for language model and dataset comparison

Similarity-based Outlier Detection for Noisy Object Re-Identification Using Beta Mixtures

MachineLearningLM: Scaling Many-shot In-context Learning via Continued Pretraining

Musculoskeletal simulation of limb movement biomechanics in Drosophila melanogaster

Between a Rock and a Hard Place: Exploiting Ethical Reasoning to Jailbreak LLMs

Cut Costs, Not Accuracy: LLM-Powered Data Processing with Guarantees

The Architecture of AI Transformation: Four Strategic Patterns and an Emerging Frontier

IS${}^3$ : Generic Impulsive--Stationary Sound Separation in Acoustic Scenes using Deep Filtering

Beyond Ensembles: Simulating All-Atom Protein Dynamics in a Learned Latent Space

HiddenObject: Modality-Agnostic Fusion for Multimodal Hidden Object Detection

Input-Time Scaling

Counterfactual Probabilistic Diffusion with Expert Models

Feedback-Driven Tool-Use Improvements in Large Language Models via Automated Build Environments

Bridging the Gap: A Framework for Real-World Video Deepfake Detection via Social Network Compression Emulation

Imposing AI: Deceptive design patterns against sustainability

LMAR: Language Model Augmented Retriever for Domain-specific Knowledge Indexing

Data-Driven Discovery of Mobility Periodicity for Understanding Urban Systems

Privacy Risks of LLM-Empowered Recommender Systems: An Inversion Attack Perspective

Web3 x AI Agents: Landscape, Integrations, and Foundational Challenges

Atherosclerosis through Hierarchical Explainable Neural Network Analysis

Survivability of Backdoor Attacks on Unconstrained Face Recognition Systems

Geological Everything Model 3D: A Promptable Foundation Model for Unified and Zero-Shot Subsurface Understanding

HiLight: A Hierarchical Reinforcement Learning Framework with Global Adversarial Guidance for Large-Scale Traffic Signal Control

AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning

OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval

The Precautionary Principle and the Innovation Principle: Incompatible Guides for AI Innovation Governance?

A Framework for Testing and Adapting REST APIs as LLM Tools

D\'ej\`a Vu: Multilingual LLM Evaluation through the Lens of Machine Translation Evaluation

JARVIS-VLA: Post-Training Large-Scale Vision Language Models to Play Visual Games with Keyboards and Mouse

Prompt Programming: A Platform for Dialogue-based Computational Problem Solving with Generative AI Models

Auxiliary Discrminator Sequence Generative Adversarial Networks (ADSeqGAN) for Few Sample Molecule Generation

Neural Force Field: Few-shot Learning of Generalized Physical Reasoning

Towards Developing Socially Compliant Automated Vehicles: Advances, Expert Insights, and A Conceptual Framework

A Novel Approach to Balance Convenience and Nutrition in Meals With Long-Term Group Recommendations and Reasoning on Multimodal Recipes and its Implementation in BEACON

Can LLM Prompting Serve as a Proxy for Static Analysis in Vulnerability Detection

Tokens, the oft-overlooked appetizer: Large language models, the distributional hypothesis, and meaning

Polish-English medical knowledge transfer: A new benchmark and results

InterFormer: Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction

AdvI2I: Adversarial Image Attack on Image-to-Image Diffusion models

A Survey on Group Fairness in Federated Learning: Challenges, Taxonomy of Solutions and Directions for Future Research

A Conflicts-free, Speed-lossless KAN-based Reinforcement Learning Decision System for Interactive Driving in Roundabouts

The Overcooked Generalisation Challenge: Evaluating Cooperation with Novel Partners in Unknown Environments Using Unsupervised Environment Design

Modelling the 5G Energy Consumption using Real-world Data: Energy Fingerprint is All You Need

Slaves to the Law of Large Numbers: An Asymptotic Equipartition Property for Perplexity in Generative Language Models

Interpretable Data-driven Anomaly Detection in Industrial Processes with ExIFFI

Analyzing the Impact of Adversarial Examples on Explainable Machine Learning

TSGCNeXt: Dynamic-Static Multi-Graph Convolution for Efficient Skeleton-Based Action Recognition with Long-term Learning Potential

Sufficient Invariant Learning for Distribution Shift

TORSO: Template-Oriented Reasoning Towards General Tasks

ForTIFAI: Fending Off Recursive Training Induced Failure for AI Models

Oyster-I: Beyond Refusal -- Constructive Safety Alignment for Responsible Language Models

Building Self-Evolving Agents via Experience-Driven Lifelong Learning: A Framework and Benchmark

QuantX: A Framework for Hardware-Aware Quantization of Generative AI Workloads

Learning to Plan with Personalized Preferences

Multi-Turn Human-LLM Interaction Through the Lens of a Two-Way Intelligibility Protocol

Spatio-Temporal Graphical Counterfactuals: An Overview

Standards in the Preparation of Biomedical Research Metadata: A Bridge2AI Perspective

Is In-Context Learning Learning?

Multimodal SAM-adapter for Semantic Segmentation

Diversified recommendations of cultural activities with personalized determinantal point processes

Improving Audio Event Recognition with Consistency Regularization

Data distribution impacts the performance and generalisability of contrastive learning-based foundation models of electrocardiograms

Towards Understanding Visual Grounding in Visual Language Models

GLAM: Geometry-Guided Local Alignment for Multi-View VLP in Mammography

I-Segmenter: Integer-Only Vision Transformer for Efficient Semantic Segmentation

Generalizing Beyond Suboptimality: Offline Reinforcement Learning Learns Effective Scheduling through Random Data

We Need a New Ethics for a World of AI Agents

SignClip: Leveraging Mouthing Cues for Sign Language Translation by Multimodal Contrastive Fusion

Openness in AI and downstream governance: A global value chain approach

SI-FACT: Mitigating Knowledge Conflict via Self-Improving Faithfulness-Aware Contrastive Tuning

Benchmark of stylistic variation in LLM-generated texts

BenchECG and xECG: a benchmark and baseline for ECG foundation models

Efficient Learning-Based Control of a Legged Robot in Lunar Gravity

Population-Aligned Persona Generation for LLM-based Social Simulation

Realism Control One-step Diffusion for Real-World Image Super-Resolution

Generating Energy-Efficient Code via Large-Language Models -- Where are we now?

Established Psychometric vs. Ecologically Valid Questionnaires: Rethinking Psychological Assessments in Large Language Models

Predictive Spike Timing Enables Distributed Shortest Path Computation in Spiking Neural Networks

TwinTac: A Wide-Range, Highly Sensitive Tactile Sensor with Real-to-Sim Digital Twin Sensor Model

Multimodal Mathematical Reasoning Embedded in Aerial Vehicle Imagery: Benchmarking, Analysis, and Exploration

Reinforcement learning for spin torque oscillator tasks

Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts

Intrinsic Dimension Estimating Autoencoder (IDEA) Using CancelOut Layer and a Projected Loss

Unsupervised Hallucination Detection by Inspecting Reasoning Processes

Drone-Based Multispectral Imaging and Deep Learning for Timely Detection of Branched Broomrape in Tomato Farms

Securing LLM-Generated Embedded Firmware through AI Agent-Driven Validation and Patching

Large Language Models Meet Legal Artificial Intelligence: A Survey

Limited Reference, Reliable Generation: A Two-Component Framework for Tabular Data Generation in Low-Data Regimes

Zero-Shot Referring Expression Comprehension via Visual-Language True/False Verification

Adaptive Token Merging for Efficient Transformer Semantic Communication at the Edge

SmartCoder-R1: Towards Secure and Explainable Smart Contract Generation with Security-Aware Group Relative Policy Optimization

WALL: A Web Application for Automated Quality Assurance using Large Language Models

An Autoencoder and Vision Transformer-based Interpretability Analysis of the Differences in Automated Staging of Second and Third Molars

Tackling One Health Risks: How Large Language Models are leveraged for Risk Negotiation and Consensus-building

Self-Augmented Robot Trajectory: Efficient Imitation Learning via Safe Self-augmentation with Demonstrator-annotated Precision

Automated Tuning for Diffusion Inverse Problem Solvers without Generative Prior Retraining

From Hugging Face to GitHub: Tracing License Drift in the Open-Source AI Ecosystem

Emulating Public Opinion: A Proof-of-Concept of AI-Generated Synthetic Survey Responses for the Chilean Case

Vibe Check: Understanding the Effects of LLM-Based Conversational Agents' Personality and Alignment on User Perceptions in Goal-Oriented Tasks

Tokens, the oft-overlooked appetizer: Large language models, the distributional hypothesis, and meaning

Created by

Haebom

저자

Julia Witte Zimmerman, Denis Hudon, Kathryn Cramer, Alejandro J. Ruiz, Calla Beauregard, Ashley Fehr, Mikaela Irene Fudolig, Bradford Demarest, Yoshi Meke Bird, Milo Z. Trujillo, Christopher M. Danforth, Peter Sheridan Dodds

개요

본 논문은 생성형 AI의 Transformer 기반 대규모 언어 모델(LLM)을 포함한 많은 언어 모델의 현재 아키텍처에서 토큰화가 필수적인 구성 요소임에도 불구하고, 모델 인지에 미치는 영향은 종종 간과된다는 점을 논의한다. 연구진은 LLM이 분포 가설(DH)이 상당히 인간과 유사한 언어 성능에 충분하며, 토큰 간의 인간에게 의미 있는 언어 단위의 출현과 현재 구조적 제약이 기존의 언어적으로 무관심한 토큰화 기술, 특히 (1) 의미적 기본 요소로서의 역할과 (2) 인간 언어의 중요한 분포 패턴을 모델에 전달하는 매개체로서의 역할에 대한 변화를 유도한다고 주장한다. BPE 토크나이저의 토큰화, Hugging Face와 tiktoken에서 얻은 기존 모델 어휘, 그리고 RoBERTa(large) 모델의 계층을 통과하는 예시 토큰 벡터의 정보를 탐구한다. 최적이 아닌 의미적 구성 요소를 생성하고 모델의 필요한 분포 패턴에 대한 접근을 가리는 것 외에도, 토큰과 사전 훈련이 편향 및 기타 원치 않는 콘텐츠에 대한 백도어 역할을 할 수 있으며, 현재의 정렬 관행이 이를 개선하지 못할 수 있다는 점을 설명한다. 또한, 토큰화 알고리즘의 목적 함수가 주요 시스템 지능과 의미 있게 분리되어 있음에도 불구하고 LLM의 인지에 영향을 미친다는 증거를 제시한다.

시사점, 한계점

•

시사점:

◦

LLM의 성능과 인지에 토큰화 알고리즘의 중요성을 강조한다.

◦

토큰화 과정에서 발생할 수 있는 편향 및 원치 않는 콘텐츠 유입 문제를 지적하고, 이에 대한 해결책 모색의 필요성을 제기한다.

◦

기존의 언어적으로 무관심한 토큰화 기술의 개선 필요성을 제시한다.

◦

토큰이 의미적 기본 요소이자 분포 패턴 전달 매개체로서의 역할을 강조한다.

◦

LLM의 인지에 토큰화 알고리즘의 목적 함수가 미치는 영향을 밝힌다.

•

한계점:

◦

본 논문에서 제시된 토큰화 개선 방안에 대한 구체적인 제안이 부족하다.

◦

다양한 LLM 아키텍처와 토큰화 방식에 대한 일반화 가능성이 제한적일 수 있다.

◦

편향 및 원치 않는 콘텐츠 유입 문제에 대한 해결책 제시가 미흡하다.

Made with Slashpage