SemiNFT: Learning to Transfer Presets from Imitation to Appreciation via Hybrid-Sample Reinforcement Learning
SemiNFT is a photorealistic preset transfer method and can be applied to various color-related tasks.
SemiNFT is a photorealistic preset transfer method and can be applied to various color-related tasks.
Photorealistic color retouching plays a vital role in visual content creation, yet manual retouching remains inaccessible to non-experts due to its reliance on specialized expertise. Reference-based methods offer a promising alternative by transferring the preset color of a reference image to a source image. However, these approaches often operate as novice learners, performing global color mappings derived from pixel-level statistics, without a true understanding of semantic context or human aesthetics. To address this issue, we propose SemiNFT, a Diffusion Transformer (DiT)-based retouching framework that mirrors the trajectory of human artistic training: beginning with rigid imitation and evolving into intuitive creation. Specifically, SemiNFT is first taught with paired triplets to acquire basic structural preservation and color mapping skills, and then advanced to reinforcement learning (RL) on unpaired data to cultivate nuanced aesthetic perception. Crucially, during the RL stage, to prevent catastrophic forgetting of old skills, we design a hybrid online-offline reward mechanism that anchors aesthetic exploration with structural review. Extensive experiments show that SemiNFT not only outperforms state-of-the-art methods on standard preset transfer benchmarks but also demonstrates remarkable intelligence in zero-shot tasks, such as black-and-white photo colorization and cross-domain (anime-to-photo) preset transfer. These results confirm that SemiNFT transcends simple statistical matching and achieves a sophisticated level of aesthetic comprehension.
We design a curriculum-style training paradigm inspired by the learning process of a human retouching expert. The process starts with cold-start supervised fine-tuning on paired image triplets to capture fundamental structural relationships between the source image, the reference image, and the retouched image, and subsequently transitions to reinforcement learning on unpaired data to cultivate higher-level aesthetic perception. Additionally, to prevent the model from forgetting the structural preservation skills learned in the cold-start stage, we propose a hybrid online-offline reward mechanism to review old skills.

We provide the visualization results of SemiNFT and SA-LUT, Neural Preset, CAP-VSTNet, GPT-Image-1.5, and Nano Banana. Notably, our method does not use points for guidance in the generated results. Notably, our method achieves accurate color alignment at both global and local levels.
Local details comparisons. Notably, our method achieves best skin-to-skin alignment.
SemiNFT achieves impressive colorization results. Although a significant chromatic discrepancy exists between the black-and-white source image and color reference image, our model facilitates precise color mapping by aligning semantic regions, ensuring skin-to-skin and background-to-background consistency, demonstrating spatially coherent and semantically aware color transfer.

SemiNFT also enables cross-domain preset transfer, such as translating aesthetics between anime images and realistic photographs. SemiNFT strictly preserves the source content while changing only the color and tonal characteristics.

SemiNFT can be seamlessly integrated with existing VLMs and Text-to-Image (T2I) generative models to restore vintage photos. Specifically, by utilizing VLM to infer the possible color of the source image, a T2I model can synthesize a high-fidelity reference image for restoration.

@article{seminft,
title={SemiNFT: Learning to Transfer Presets from Imitation to Appreciation via Hybrid-Sample Reinforcement Learning},
author={Yang, Melany and Yu, Yuhang and Weng, Diwang and Chen, Jinwei and Dong, Wei},
journal={arXiv preprint arXiv:2602.08582},
year={2026}
}