This paper presents Photonic Fabric™ and Photonic Fabric Appliance™ (PFA), optical-based switch and memory subsystems that provide low latency, high bandwidth, and low energy consumption. It integrates high-bandwidth HBM3E memory, on-module optical switches, and external DDR5 into a 2.5D electro-optical system-in-package, providing up to 32 TB of shared memory and 115 Tbps of all-connected digital switching. Photonic Fabric™ enables more efficient execution of parallel processing strategies in distributed AI training and inference. It addresses the silicon area constraints that limit the fixed memory-to-compute ratio of conventional XPU accelerator designs. It expands memory capacity and bandwidth by replacing the local HBM stack of XPUs with chiplets connected to the Photonic Fabric. We present CelestiSim, a lightweight analytical simulator validated on NVIDIA H100 and H200 systems, and evaluate the LLM inference performance and energy savings in PFA without changing the GPU core design. Simulation results show up to 3.66x throughput improvement and 1.40x latency reduction in 405B parameter LLM inference, up to 7.04x throughput improvement and 1.41x latency reduction in 1T parameter LLM inference, and 60-90% data movement energy consumption reduction in all LLM training scenarios. Although the results are presented for NVIDIA GPUs, they can be similarly applied to other AI accelerator designs (XPUs) that share the same memory-compute constraints.