This paper quantitatively evaluated the performance of various open-source and proprietary large-scale language models (LLMs) applied to selected tasks of the European Patent Attorney Examination (EQE). Among the models evaluated, including the GPT family, Anthropic, Deepseek, and Llama-3, OpenAI's GPT-4 achieved the highest accuracy (0.82) and F1 score (0.81), but fell short of expert-level performance (0.90). AWS Llama 3.1 8B and the Python-based Llama 3.1 8B performed at the level of simple guessing. The models also demonstrated limitations in text and graphic integration and formatting, and expert evaluations revealed issues with logical consistency, clarity, and legal basis. Model outputs were sensitive to temperature changes and prompt expressions, suggesting the need for expert supervision.