This paper presents Photonic Fabric™ and Photonic Fabric Appliance™ (PFA), optical-based switch and memory subsystems that deliver low latency, high bandwidth, and low energy consumption. The PFA integrates high-bandwidth HBM3E memory, on-module optical switches, and external DDR5 into a 2.5D electro-optical system-in-package, providing up to 32 TB of shared memory and 115 Tbps of all-in-one digital switching. Photonic Fabric™ enables distributed AI training and inference to execute parallel strategies more efficiently. It removes the silicon beach constraints that limit fixed memory-to-compute ratios observed in traditional XPU accelerator designs. Replacing the local HBM stack in an XPU with chiplets connected to the Photonic Fabric increases memory capacity and bandwidth, scaling to levels not achievable with on-package HBM alone. We introduce CelestiSim, a lightweight analytical simulator validated on NVIDIA H100 and H200 systems, to evaluate the performance and energy savings of LLM in the PFA without significant changes to the GPU core design. Simulation results show that using PFA achieves up to 3.66x throughput improvement and 1.40x latency reduction in 405B-parameter LLM inference, up to 7.04x throughput improvement and 1.41x latency reduction in 1T-parameter LLM inference, and 60-90% data movement energy reduction of collective computation in all LLM training scenarios. While these results are presented for NVIDIA GPUs, they can be similarly applied to other AI accelerator designs (XPUs) that share the same fundamental limitation of fixed memory-to-compute ratio.