EigenBench is a novel benchmarking methodology for solving the value alignment problem in AI. To address the lack of existing quantitative metrics, it proposes a black-box approach that comparatively evaluates the level of value alignment across various language models. It takes as input an ensemble of models, a constitution describing the value system, and a scenario dataset, and outputs a vector score quantifying each model's alignment with the given constitution. Each model evaluates the outputs of other models under various scenarios, and the EigenTrust algorithm aggregates these evaluations to produce a score reflecting the weighted average judgment of the entire ensemble. It is designed to quantify features that may vary even among rational judges, without relying on correct-answer labels. Experiments using prompt personas to test the sensitivity of EigenBench scores to models or prompts revealed that while most of the variance is explained by the prompts, small residuals quantify the inherent biases of the models themselves.