Robust Safety Monitoring of Language Models via Activation Watermarking