Skip to main content

Command Palette

Search for a command to run...

Notes on FABRIC: Personalizing Diffusion Models with Iterative Feedback

This is a summary of an important research paper that provides a 13:1 time savings. It was crafted by humans working with several AI's. The goal is to save time and curate good ideas.

Published
4 min read
Notes on FABRIC: Personalizing Diffusion Models with Iterative Feedback

Link to paper: https://arxiv.org/abs/2307.10159

Paper published on: 2023-07-19

Paper's authors: Dimitri von Rütte, Elisabetta Fedele, Jonathan Thomm, Lukas Wolf

GPT3 API Cost: $0.02

GPT4 API Cost: $0.08

Total Cost To Write This: $0.10

Time Savings: 13:1

The ELI5 TLDR:

This tutorial explains a new approach called FABRIC for improving text-to-image models using human feedback. FABRIC allows users to give feedback on generated images and uses that feedback to make better images in the future. It does this by using positive and negative feedback images to guide the image generation process. The tutorial also introduces a way to measure how well these models perform with human feedback. FABRIC can be used to create personalized and customized content. However, there are some challenges, such as the models becoming too similar after a few rounds of feedback. In the future, researchers will work on improving diversity and finding a balance between exploring new ideas and using feedback effectively. It's also important to use these models responsibly and ethically.

The Deeper Dive:

Summary

This tutorial delves into the details of a groundbreaking research paper that introduces a novel approach called FABRIC (Feedback And Bayesian Regression In Conditioning) for integrating iterative human feedback into diffusion-based text-to-image models. The key novelty of FABRIC is its ability to optimize the image generation process based on user feedback without needing explicit training. It achieves this by using positive and negative feedback images to manipulate future results through reference image-conditioning. The paper also introduces a comprehensive evaluation methodology to quantify the performance of generative visual models that integrate human feedback.

By understanding FABRIC, you can potentially build applications in personalized content creation and customization, as it allows users to provide natural and intuitive guidance based on prior images or previously generated ones.

The FABRIC Approach

FABRIC leverages the self-attention layer in the U-Net architecture to condition the diffusion process on feedback images. It improves the generated results over multiple rounds of iterative feedback, optimizing according to arbitrary user preferences.

FABRIC uses disliked images concatenated to the conditional and unconditional U-Net pass. It then reweights the attention scores based on the pass and time step in the denoising process. This method explores linear interpolation to emphasize coarse features or fine details from the reference, and the feedback process can be scheduled according to the denoising steps.

For multiple rounds, the algorithm is extended by appending liked and disliked images to the positive and negative feedback.

Evaluation Methodology

The study introduces a comprehensive evaluation methodology for generative visual models incorporating human feedback. Two versions of FABRIC are evaluated: FABRIC and FABRIC+HPS LoRA. These methods are compared to standard Stable Diffusion models.

The researchers use the PickScore as a proxy for general human preference. They compute the CLIP similarity between generated and feedback images, and introduce the In-batch Image Diversity metric.

Two experimental settings are used for feedback selection: Preference Model-Based and Target Image-Based. In the Preference Model-Based setting, FABRIC outperforms the baselines in terms of PickScore and CLIP similarity. In the Target Image-Based setting, FABRIC improves similarity to the target image and in-batch image diversity compared to the baselines.

FABRIC in Action

The FABRIC procedure involves generating images based on a prompt and receiving feedback on those images. The feedback consists of one positive and one negative response. The generated images are reference-conditioned using a diffusion model.

The diffusion model uses initial noise and hidden states to generate the images. The hidden states are computed using a modified U-Net with self-attention and cross-attention. The feedback is used to compute weights for the positive and negative responses. These weights are used in the modified U-Net to generate the next step in the diffusion process.

Future Directions and Ethical Considerations

FABRIC tends to trade exploration for exploitation, often collapsing to a uniform distribution after a handful of feedback rounds. Prompt dropout is a possible approach to combat the collapse in diversity, but it may risk dropping crucial words in the prompt and changing the generations completely. Future work will investigate approaches to increasing diversity and controlling the exploration-exploitation trade-off in a more principled fashion.

FABRIC provides a well-defined action space with different parameters that can affect the generated results, opening up the avenue for performing Bayesian optimization on an arbitrary objective.

However, responsible and ethical usage of text-to-image models is crucial, and clear guidelines regarding their legal and ethical utilization should be established.