This paper highlights that quantum variational algorithms hold the potential to solve meaningful problems on moderately sized quantum hardware, but suffer from circuit design challenges. Specifically, to address the scalability challenges of reinforcement learning (RL)-based methods for quantum architecture search (QAS), we propose $\textit{TensorRL-QAS}$, a novel framework that combines tensor network methods with RL. This framework starts QAS with a matrix product state (MPS) approximation of the target solution, reducing the search space and accelerating convergence to physically meaningful circuits. Applied to quantum chemistry problems with up to 12 qubits, TensorRL-QAS reduces the number of CNOTs and circuit depth by up to 10x compared to existing methods, while maintaining or exceeding chemical accuracy. Furthermore, it significantly improves the underperformance of existing methods by reducing function evaluations of classical optimizers by up to 100x, accelerating training episodes by up to 98%, and achieving a 50% success rate on a 10-qubit system. It demonstrates robustness and versatility in noiseless and noise scenarios, and shows scalability up to 20-qubit systems.