Stable Diffusion web UI(AUTOMATIC1111) を使用しないでGoogle Colab でControlNet を使う方法 (StableDiffusionControlNetPipeline)

2023年4月23日2023年7月15日

Google ColabでリモートUIが使用できなくなりそうなので別の方法を試します

Diffusers

Diffusers は、分子の画像、音声、さらには 3D 構造を生成するための最先端の事前トレーニング済み拡散モデルの頼りになるライブラリです。

[3つのコアコンポーネント]
わずか数行のコードで推論を実行できる最先端の拡散パイプライン。
さまざまな拡散速度と出力品質に対応する交換可能なノイズスケジューラ。
ビルディングブロックとして使用でき、スケジューラと組み合わせて使用できる事前トレーニング済みのモデル

https://github.com/takuma104/diffusers

StableDiffusionControlNetPipeline

StableDiffusionControlNet用のパイブラインを使います

次のサイトの処理を動かしてみます
https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/controlnet.ipynb#scrollTo=AocAEIA8n33t

参考
https://huggingface.co/docs/diffusers/main/en/api/pipelines/stable_diffusion/controlnet

Google Colab用のソース一部修正

「真珠を持つ少女」をcanny pre-processor:を使って4枚の画像を作成して表示するサンプルです

!pip install -q diffusers transformers xformers git+https://github.com/huggingface/accelerate.git
!pip install -q opencv-contrib-python
!pip install -q controlnet_aux

from diffusers import StableDiffusionControlNetPipeline, ControlNetModel, UniPCMultistepScheduler
from diffusers.utils import load_image

import cv2
from PIL import Image
import numpy as np
import torch.utils

image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)
image
image = np.array(image)

low_threshold = 100
high_threshold = 200

image = cv2.Canny(image, low_threshold, high_threshold)
image = image[:, :, None]
image = np.concatenate([image, image, image], axis=2)
canny_image = Image.fromarray(image)
canny_image

controlnet = ControlNetModel.from_pretrained("lllyasviel/sd-controlnet-canny", torch_dtype=torch.float16)
pipe = StableDiffusionControlNetPipeline.from_pretrained(
    "runwayml/stable-diffusion-v1-5", controlnet=controlnet, torch_dtype=torch.float16, safety_checker=None
)

#
pipe.scheduler = UniPCMultistepScheduler.from_config(pipe.scheduler.config)
#
pipe.enable_model_cpu_offload()
#
pipe.enable_xformers_memory_efficient_attention()
#if pipe.safety_checker is not None:
#   pipe.safety_checker = lambda images, **kwargs: (images, False)
device = "cuda"
pipe.to(device)


prompt = ", best quality, extremely detailed"
prompt = [t + prompt for t in ["Sandra Oh", "Kim Kardashian", "rihanna", "taylor swift"]]
generator = [torch.Generator(device=device).manual_seed(2) for i in range(len(prompt))]

output = pipe(
    prompt,
    canny_image,
    negative_prompt=["monochrome, lowres, bad anatomy, worst quality, low quality"] * 4,
    num_inference_steps=20,
    generator=generator,
)

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows * cols

    w, h = imgs[0].size
    grid = Image.new("RGB", size=(cols * w, rows * h))
    grid_w, grid_h = grid.size

    for i, img in enumerate(imgs):
        grid.paste(img, box=(i % cols * w, i // cols * h))
    return grid

image_grid(output.images, 2, 2)