[공지사항]을 빙자한 안부와 근황 
Show more

Daily Arxiv

This is a page that curates AI-related papers published worldwide.
All content here is summarized using Google Gemini and operated on a non-profit basis.
Copyright for each paper belongs to the authors and their institutions; please make sure to credit the source when sharing.

NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining

Created by
  • Haebom

Author

Maksim Kuprashevich, Grigorii Alekseenko, Irina Tolstykh, Georgii Fedorov, Bulat Suleimanov, Vladimir Dokholyan, Aleksandr Gordeev

Outline

In this paper, we present a pipeline for automatically generating a large-scale, high-quality image editing dataset to address the limitations of generative model-based image editing assistants that perform image editing via natural language commands. While existing approaches struggle to obtain accurate pixel-level editing examples, our pipeline automatically generates high-quality triplet data (original images, commands, and edited images) by directly evaluating command compliance and aesthetic factors using publicly available generative models and the Gemini validator. We increase the dataset size by 2.2 times using inversion and compositional bootstrapping techniques, and present the NHR-Edit dataset consisting of 358,000 high-quality triplets and a fine-tuned Bagel-NHR-Edit model based on it. Large-scale cross-dataset evaluations show that the proposed dataset and model outperform other publicly available datasets and models.

Takeaways, Limitations

Takeaways:
We present a pipeline for automatically generating high-quality image editing datasets, solving the challenge of training large-scale image editing models.
Improving research accessibility by releasing the NHR-Edit dataset consisting of 358,000 high-quality triplets and a fine-tuned Bagel model.
We present a novel approach to directly evaluate command compliance and aesthetic factors by leveraging the Gemini validator.
We present a technique to effectively increase the dataset size through inversion and compositional bootstrapping.
Validation of performance superiority through large-scale cross-dataset evaluation.
Limitations:
Detailed analysis and verification of the performance of the Gemini validator is required.
Generalization performance evaluation is needed for various types of image editing commands.
Further analysis of the computational cost and efficiency of the pipeline is needed.
There is a need to analyze the bias of the generated dataset and find solutions.
👍