In this paper, we study how to efficiently scale inference time calculations for open-ended generative tasks in multilingual, multitask environments. While previous studies have focused on a few domains such as English, mathematics, and code, our study focuses on techniques that are open-ended, formally verifiable, and generalizable across multiple languages. We show that temperature-based sampling and selection strategies should be tailored to different domains and language settings. We find that existing selection methods that are effective in English do not generalize to other languages, and propose novel sampling and selection strategies that are tailored to multilingual and multitask inference scenarios. The proposed method achieves significant performance improvements across a variety of languages and tasks, and in particular, it improves the win rate of an 8B model by +6.8 on m-ArenaHard-v2.0 prompts by an average of +9.0 on a 111B model, Command-A, compared to single-sample decoding with only 5 samples. This highlights the need for language- and task-aware approaches to improve performance in low-resource languages.