This paper addresses the multi-label classification problem involving a large number of labels. In particular, we consider the case where the output labels satisfy certain logical constraints. We propose an architecture that inputs classifiers for individual labels into an expressive sequential model to generate a joint distribution. One of the advantages of such an expressive model is its ability to model correlations that may arise from constraints. We experimentally demonstrate that the proposed architecture can exploit constraints during learning and enforce constraints at inference time.