This study evaluated the practical performance of data extraction automation from specialized RCTs (randomized controlled trials) for meta-analysis. Three large-scale language models (Gemini-2.0-flash, Grok-3, GPT-4o-mini) were used in three medical fields (hypertension, diabetes, and orthopedics) to perform statistical output, bias risk assessment, and study-level feature extraction tasks. Four prompting strategies (default prompts, self-reflective prompts, model ensembles, and custom prompts) were tested to explore ways to improve extraction quality. All models showed high precision but low recall due to omission of key information, and we found that custom prompts were the most effective way to improve recall by up to 15%. Based on this, we propose a three-level LLM usage guideline that matches the level of automation according to task complexity and risk, providing practical advice for data extraction automation in real-world meta-analysis, and aiming to balance expert supervision and LLM efficiency through goal-oriented and task-specific automation.