We present the first version of the AI Productivity Index (APEX), a benchmark for assessing whether AI models can perform high-value knowledge tasks. APEX addresses one of the biggest inefficiencies in AI research, stemming from benchmarks that fail to test economically relevant skills beyond coding. APEX-v1.0 includes 200 test cases and covers four domains: investment banking, management consulting, law, and primary care. APEX was built in three phases. First, we recruited experts with top-level experience, such as investment bankers at Goldman Sachs. Second, the experts generated prompts reflecting high-value tasks in their daily work. Third, the experts developed scoring criteria to evaluate model responses. Using the LM judger, we evaluated 23 state-of-the-art models on APEX-v1.0. GPT 5 (thought = high) achieved the highest average score (64.2%), followed by Grok 4 (61.3%) and Gemini 2.5 Flash (thought = on) (60.4%). Qwen 3 235B is the best-performing open-source model, ranking 7th overall. Even the best models show a significant gap in performance compared to human experts, highlighting the need for better measures of models' ability to generate economically valuable tasks.