This paper addresses the safety issue of large-scale language models (LLMs) that are vulnerable to adversarial manipulations such as jailbreaking via prompt injection attacks. We investigate the latent subspaces of safe and jailbroken states by extracting the latent activations of the LLM. Inspired by the dynamics of the human-attractor network in neuroscience, we hypothesize that LLM activations settle into metastable states that can be identified and perturbed to induce state transitions. Using dimensionality reduction techniques, we project the activations of safe and jailbroken responses to reveal the latent subspaces in low-dimensional space. We then derive perturbation vectors that, when applied to safe representations, move the model to jailbroken states. The results show that these causal interventions lead to statistically significant jailbroken responses for some prompts. We also investigate how these perturbations propagate through the layers of the model, and whether the induced state changes are locally maintained or cascade throughout the network. The results indicate that targeted perturbations induce distinct changes in activations and model responses. This research paves the way for potential proactive defenses that move from traditional safeguard-based methods to preemptive and model-independent techniques that neutralize adversarial states at the representational level.