First, I've summarized the logic:
* Method 1. Maintaining the existing method: If there is a script in the video, summarize it after transcription
* If successful, proceed with summary, if failure, perform method 2
* Method 2. Audio-based analysis: Extract audio with yt-dlp → Transcribe with Whisper → Summarize with GPT-4o