This study systematically evaluates the ethical decision-making capabilities and potential biases of nine popular large-scale language models (LLMs). We evaluate the models' ethical preferences, sensitivity, stability, and clustering patterns across 50,400 trials, covering four ethical dilemma scenarios (protective vs. harmful) involving protected attributes, including single and cross-attribute combinations. Results reveal significant biases toward protected attributes across all models, with preferences varying across model type and dilemma contexts. Specifically, open-source LLMs exhibit stronger preferences for marginalized groups and greater sensitivity in harmful scenarios, whereas closed-source models are more selective in protective scenarios and tend to favor mainstream groups. Furthermore, ethical behavior varies across dilemmas. LLMs maintain consistent patterns in protective scenarios, but make more diverse and cognitively demanding decisions in harmful scenarios. Furthermore, models exhibit more pronounced ethical biases in cross-attribute settings than in single-attribute settings, suggesting that complex inputs reveal deeper biases. These results highlight the need for a multidimensional and context-aware assessment of ethical behavior in LLMs, and suggest a systematic assessment and approach to understanding and addressing fairness in LLM decision-making.