TruthLens is a comprehensive and generalizable deepfake detection framework that goes beyond traditional binary classification (real vs. fake) to provide detailed text-based inference. It utilizes a task-driven representation integration strategy that combines the global semantic context of a multimodal large-scale language model (MLLM) with local features from a visual model. This enables fine-grained, region-based inference for facial manipulation and fully synthetic content, answering granular questions like "Do the eyes, nose, and mouth look real?" Experimental results on diverse datasets demonstrate that TruthLens sets a new standard in both forensic interpretability and detection accuracy, and generalizes well across both known and unknown manipulations.