Daily Arxiv

This page organizes papers related to artificial intelligence published around the world.
This page is summarized using Google Gemini and is operated on a non-profit basis.
The copyright of the paper belongs to the author and the relevant institution. When sharing, simply cite the source.

NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching

OpenDerisk: An Industrial Framework for AI-Driven SRE, with Design, Implementation, and Case Studies

Thompson Sampling via Fine-Tuning of LLMs

A$^2$FM: An Adaptive Agent Foundation Model for Tool-Aware Hybrid Reasoning

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning

HALF: Harm-Aware LLM Fairness Evaluation Aligned with Deployment

ENIGMA: The Geometry of Reasoning and Alignment in Large-Language Models

Rediscovering Entropy Regularization: Adaptive Coefficient Unlocks Its Potential for LLM Reinforcement Learning

Latent Retrieval Augmented Generation of Cross-Domain Protein Binders

All Code, No Thought: Current Language Models Struggle to Reason in Ciphered Language

Gradient-Sign Masking for Task Vector Transport Across Pre-Trained Models

Higher-order interactions of multi-layer prompt

The Hidden Bias: A Study on Explicit and Implicit Political Stereotypes in Large Language Models

Think Just Enough: Sequence-Level Entropy as a Confidence Signal for LLM Reasoning

Ctrl-VI: Controllable Video Synthesis via Variational Inference

A Denoising Framework for Real-World Ultra-Low Dose Lung CT Images Based on an Image Purification Strategy

Comparing Human and Language Models Sentence Processing Difficulties on Complex Structures

Quantifying the Accuracy-Interpretability Trade-Off in Concept-Based Sidechannel Models

Agentic Misalignment: How LLMs Could Be Insider Threats

Inoculation Prompting: Eliciting traits from LLMs during training can suppress them at test-time

PHORECAST: Enabling AI Understanding of Public Health Outreach Across Populations

Predictive Preference Learning from Human Interventions

Robust Policy Expansion for Offline-to-Online RL under Diverse Data Corruption

PerfBench: Can Agents Resolve Real-World Performance Bugs?

PATCH: Learnable Tile-level Hybrid Sparsity for LLMs

Does FLUX Already Know How to Perform Physically Plausible Image Composition?

On Theoretical Interpretations of Concept-Based In-Context Learning

Chiplet-Based RISC-V SoC with Modular AI Acceleration

From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning

Socratic Mind: Impact of a Novel GenAI-Powered Assessment Tool on Student Learning and Higher-Order Thinking

EdiVal-Agent: An Object-Centric Framework for Automated, Fine-Grained Evaluation of Multi-Turn Editing

Preservation of Language Understanding Capabilities in Speech-aware Large Language Models

MarkDiffusion: An Open-Source Toolkit for Generative Watermarking of Latent Diffusion Models

Merge-of-Thought Distillation

ECG-Soup: Harnessing Multi-Layer Synergy for ECG Foundation Models

Rethinking Purity and Diversity in Multi-Behavior Sequential Recommendation from the Frequency Perspective

Large Language Models Enable Design of Personalized Nudges across Cultures

PETLP: A Privacy-by-Design Pipeline for Social Media Data in AI Research

LLMs are Single-threaded Reasoners: Demystifying the Working Mechanism of Soft Thinking

Analysis of Hyperparameter Optimization Effects on Lightweight Deep Models for Real-Time Image Classification

Innovator: Scientific Continued Pretraining with Fine-grained MoE Upcycling

Causal Language Control in Multilingual Transformers via Sparse Feature Steering

HANS-Net: Hyperbolic Convolution and Adaptive Temporal Attention for Accurate and Generalizable Liver and Tumor Segmentation in CT Imaging

Why is Your Language Model a Poor Implicit Reward Model?

Prompt Perturbations Reveal Human-Like Biases in Large Language Model Survey Responses

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

TopoStreamer: Temporal Lane Segment Topology Reasoning in Autonomous Driving

A Clinically-Grounded Two-Stage Framework for Renal CT Report Generation

VALID-Mol: a Systematic Framework for Validated LLM-Assisted Molecular Design

R1-Ranker: Teaching LLM Rankers to Reason

LLM-guided Chemical Process Optimization with a Multi-Agent Approach

Subspace-Boosted Model Merging

SoK: Evaluating Jailbreak Guardrails for Large Language Models

TAI3: Testing Agent Integrity in Interpreting User Intent

KScope: A Framework for Characterizing the Knowledge Status of Language Models

When Style Breaks Safety: Defending LLMs Against Superficial Style Alignment

IQUEST: An Iterative Question-Guided Framework for Knowledge Base Question Answering

Attention-Aided MMSE for OFDM Channel Estimation: Learning Linear Filters with Attention

Adaptive Budget Allocation for Orthogonal-Subspace Adapter Tuning in LLMs Continual Learning

Thinker: Learning to Think Fast and Slow

KL-regularization Itself is Differentially Private in Bandits and RLHF

InfoDet: A Dataset for Infographic Element Detection

Synthetic History: Evaluating Visual Representations of the Past in Diffusion Models

Checkpoint-GCG: Auditing and Attacking Fine-Tuning-Based Prompt Injection Defenses

APEX: Empowering LLMs with Physics-Based Task Planning for Real-time Insight

ConDiSim: Conditional Diffusion Models for Simulation Based Inference

Internet of Agents: Fundamentals, Applications, and Challenges

The ML.ENERGY Benchmark: Toward Automated Inference Energy Measurement and Optimization

LLMs' Suitability for Network Security: A Case Study of STRIDE Threat Modeling

Absolute Zero: Reinforced Self-play Reasoning with Zero Data

Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach

On Equivariance and Fast Sampling in Video Diffusion Models Trained with Warped Noise

Benchmarking Adversarial Robustness to Bias Elicitation in Large Language Models: Scalable Automated Assessment with LLM-as-a-Judge

EDIT: Enhancing Vision Transformers by Mitigating Attention Sink through an Encoder-Decoder Architecture

Overcoming Sparsity Artifacts in Crosscoders to Interpret Chat-Tuning

Leveraging LLMs, IDEs, and Semantic Embeddings for Automated Move Method Refactoring

A Neural Symbolic Model for Space Physics

Never too Prim to Swim: An LLM-Enhanced RL-based Adaptive S-Surface Controller for AUVs under Extreme Sea Conditions

Tokenizing Single-Channel EEG with Time-Frequency Motif Learning

Evaluating Sakana's AI Scientist: Bold Claims, Mixed Results, and a Promising Future?

The simulation of judgment in LLMs

FedRTS: Federated Robust Pruning via Combinatorial Thompson Sampling

The Last Dependency Crusade: Solving Python Dependency Conflicts with LLMs

Polynomial-Time Algorithms for Fair Orientations of Chores

VERITAS: Verifying the Performance of AI-native Transceiver Actions in Base-Stations

Moto: Latent Motion Token as the Bridging Language for Learning Robot Manipulation from Videos

An AI-Driven Multimodal Smart Home Platform for Continuous Monitoring and Assistance in Post-Stroke Motor Impairment

Disentangled and Self-Explainable Node Representation Learning

AI-generated Essays: Characteristics and Implications on Automated Scoring and Academic Integrity

CoreGuard: Safeguarding Foundational Capabilities of LLMs Against Model Stealing in Edge Deployment

VoxelPrompt: A Vision Agent for End-to-End Medical Image Analysis

Emergent Visual Grounding in Large Multimodal Models Without Grounding Supervision

SFTMix: Elevating Language Model Instruction Tuning with Mixup Recipe

MIO: A Foundation Model on Multimodal Tokens

GraphLand: Evaluating Graph Machine Learning Models on Diverse Industrial Data

The Fluorescent Veil: A Stealthy and Effective Physical Adversarial Patch Against Traffic Sign Recognition

Say My Name: a Model's Bias Discovery Framework

Visual Stereotypes of Autism Spectrum in Janus-Pro-7B, DALL-E, Stable Diffusion, SDXL, FLUX, and Midjourney

A Comprehensive Review of Recommender Systems: Transitioning from Theory to Practice

Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference

Causal Language Control in Multilingual Transformers via Sparse Feature Steering

Created by

Haebom

Author

Cheng-Ting Chou, George Liu, Jessica Sun, Cole Blondin, Kevin Zhu, Vasu Sharma, Sean O'Brien

Outline

This study explores the use of pre-trained sparse autoencoder (SAE) features to control the generated language of a large-scale multilingual language model (LLM). Specifically, we used SAE features applied to the residual streams of Gemma-2B and Gemma-9B models in a zero-shot environment without explicit language prompts or fine-tuning to identify features exhibiting activation differences between English, Chinese, Japanese, Spanish, and French. Using a single SAE feature manipulation, we achieved language switching with a success rate of up to 90% (based on the FastText language classification criterion) while maintaining semantic fidelity via LaBSE similarity. Our analysis reveals that language steering is most effective in the mid-to-late transformer layers, amplified by specific attention heads associated with language-sensitive SAE features.

Takeaways, Limitations

•

We present the possibility of controlling multilingual generation in a lightweight and interpretable manner through sparse feature steering.

•

Increased language control success rate in zero-shot environments.

•

We improved our understanding of model behavior by revealing the correlation between specific attention heads and SAE features.

•

As this experiment was limited to the Gemma-2B and Gemma-9B models, further studies are needed to determine generalizability to other models and languages.

•

In addition to using FastText for language classification and LaBSE similarity for semantic fidelity assessment, further analysis of other evaluation metrics is needed.

•

In addition to single feature manipulation, research is needed on the effects of simultaneous manipulation of multiple features.

View PDF

Made with Slashpage