[SemanticStyleGAN] Image synthesis, Image editing using AI

2022年6月25日土曜日

Artificial Intelligence English

In this article, I will introduce how to perform align face, invert, Image synthesis, Image editing, etc. of any image using SemanticStyleGAN.

eyecatch
ref: seasonSH/SemanticStyleGAN

SemanticStyleGAN

Abstruct

Recent studies have shown that StyleGANs provide promising prior models for downstream tasks on image synthesis and editing. However, since the latent codes of StyleGANs are designed to control global styles, it is hard to achieve a fine-grained control over synthesized images. We present SemanticStyleGAN, where a generator is trained to model local semantic parts separately and synthesizes images in a compositional way. The structure and texture of different local parts are controlled by corresponding latent codes. Experimental results demonstrate that our model provides a strong disentanglement between different spatial areas. When combined with editing methods designed for StyleGANs, it can achieve a more fine-grained control to edit synthesized or real images. The model can also be extended to other domains via transfer learning. Thus, as a generic prior model with built-in disentanglement, it could facilitate the development of GAN-based applications and enable more potential downstream tasks.

Overview
ref: SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

Please refer to this paper for details.

In this article, we will use the above method to synthesize and edit images from any image.

Demo(Colaboratory)

Now, let's perform image composition and image editing while actually moving it.
The source code is also described in this article, but you can also get it on GitHub below.
GitHub - Colaboratory demo

You can also open it directly in Google Colaboratory from the following.
Open In Colab

Setup environment

Let's set it up. After opening Colaboratory, please set GPU of Runtime.

First, get the source code from Github.

%cd /content

!git clone https://github.com/seasonSH/SemanticStyleGAN.git
# for align face
!git clone https://github.com/adamian98/pulse.git

Then install the library.

%cd /content/SemanticStyleGAN

# ninja
!wget https://github.com/ninja-build/ninja/releases/download/v1.8.2/ninja-linux.zip > /dev/null
!sudo unzip ninja-linux.zip -d /usr/local/bin/ > /dev/null
!sudo update-alternatives --install /usr/bin/ninja ninja /usr/local/bin/ninja 1 --force > /dev/null

!pip install -r requirements.txt

Finally, import the library.

%cd /content/SemanticStyleGAN

import os
import argparse
import shutil
import numpy as np
import imageio
import time
import torch

from models import make_model
from visualize.utils import generate, cubic_spline_interpolate

from criteria.lpips import lpips

This completes the environment setup.

Trained model setup

Then download the trained model.

%cd /content/SemanticStyleGAN
!mkdir pretrained

if not os.path.exists('./pretrained/CelebAMask-HQ-512x512.pt'):
  !wget -c https://github.com/seasonSH/SemanticStyleGAN/releases/download/1.0.0/CelebAMask-HQ-512x512.pt \
        -O ./pretrained/CelebAMask-HQ-512x512.pt
if not os.path.exists('./pretrained/BitMoji-512x512.pt'):
  !wget -c https://github.com/seasonSH/SemanticStyleGAN/releases/download/1.0.0/BitMoji-512x512.pt \
        -O ./pretrained/BitMoji-512x512.pt
if not os.path.exists('./pretrained/MetFaces-512x512.pt'):
  !wget -c https://github.com/seasonSH/SemanticStyleGAN/releases/download/1.0.0/MetFaces-512x512.pt \
        -O ./pretrained/MetFaces-512x512.pt
if not os.path.exists('./pretrained/Toonify-512x512.pt'):
  !wget -c https://github.com/seasonSH/SemanticStyleGAN/releases/download/1.0.0/Toonify-512x512.pt \
        -O ./pretrained/Toonify-512x512.pt

The model will be downloaded to the output destination of -O.

Image Synthesis

First, we will randomly synthesize images from the latent space using CelebAMask-HQ-512x512.pt.

%cd /content/SemanticStyleGAN

args = argparse.ArgumentParser()

args.ckpt = './pretrained/CelebAMask-HQ-512x512.pt'
args.outdir = './results/samples'
args.batch = 8
args.sample = 20
args.truncation = 0.7
args.truncation_mean = 10000
args.save_latent = True
args.device = 'cuda'

if os.path.exists(args.outdir):
  shutil.rmtree(args.outdir)
os.makedirs(args.outdir)

print("Loading model ...")
ckpt = torch.load(args.ckpt)
model = make_model(ckpt['args'])
model.to(args.device)
model.eval()
model.load_state_dict(ckpt['g_ema'])
mean_latent = model.style(torch.randn(args.truncation_mean, model.style_dim, device=args.device)).mean(0)

print("Generating images ...")
start_time = time.time()
with torch.no_grad():
  styles = model.style(torch.randn(args.sample, model.style_dim, device=args.device))
  styles = args.truncation * styles + (1-args.truncation) * mean_latent.unsqueeze(0)
  images, segs = generate(model, styles, mean_latent=mean_latent, batch_size=args.batch)
  for i in range(len(images)):
    imageio.imwrite(f"{args.outdir}/{str(i).zfill(6)}_img.jpg", images[i])
    imageio.imwrite(f"{args.outdir}/{str(i).zfill(6)}_seg.jpg", segs[i])
    if args.save_latent:
      np.save(f'{args.outdir}/{str(i).zfill(6)}_latent.npy', styles[i:i+1].cpu().numpy())
print(f"Average speed: {(time.time() - start_time)/(args.sample)}s")

The following composite image and segmentation image are output.

randam image

Image Editing

Next, edit the image using the image generated earlier.


latent_dict_celeba = {
  2:  "bcg_1",
  3:  "bcg_2",
  4:  "face_shape",
  5:  "face_texture",
  6:  "eye_shape",
  7:  "eye_texture",
  8:  "eyebrow_shape",
  9:  "eyebrow_texture",
  10: "mouth_shape",
  11: "mouth_texture",
  12: "nose_shape",
  13: "nose_texture",
  14: "ear_shape",
  15: "ear_texture",
  16: "hair_shape",
  17: "hair_texture",
  18: "neck_shape",
  19: "neck_texture",
  20: "cloth_shape",
  21: "cloth_texture",
  22: "glass",
  24: "hat",
  26: "earing",
  0:  "coarse_1",
  1:  "coarse_2",
}

%cd /content/SemanticStyleGAN

args = argparse.ArgumentParser()

args.ckpt = './pretrained/CelebAMask-HQ-512x512.pt'
args.latent = './results/samples/000000_latent.npy'
args.outdir = './results/videos'
args.batch = 8
args.sample = 10
args.steps = 160
args.truncation = 0.7
args.truncation_mean = 10000
args.dataset_name = 'celeba'
args.device = 'cuda'


if os.path.exists(args.outdir):
  shutil.rmtree(args.outdir)
os.makedirs(args.outdir)

print("Loading model ...")
ckpt = torch.load(args.ckpt)
model = make_model(ckpt['args'])
model.to(args.device)
model.eval()
model.load_state_dict(ckpt['g_ema'])
mean_latent = model.style(torch.randn(args.truncation_mean, model.style_dim, device=args.device)).mean(0)

print("Generating original image ...")
with torch.no_grad():
  if args.latent is None:
    styles = model.style(torch.randn(1, model.style_dim, device=args.device))
    styles = args.truncation * styles + (1-args.truncation) * mean_latent.unsqueeze(0)
  else:
    styles = torch.tensor(np.load(args.latent), device=args.device)
  if styles.ndim == 2:
    assert styles.size(1) == model.style_dim
    styles = styles.unsqueeze(1).repeat(1, model.n_latent, 1)
  images, segs = generate(model, styles, mean_latent=mean_latent, randomize_noise=False)
  imageio.imwrite(f'{args.outdir}/image.jpeg', images[0])
  imageio.imwrite(f'{args.outdir}/seg.jpeg', segs[0])
print("Generating videos ...")
if args.dataset_name == "celeba":
  latent_dict = latent_dict_celeba
else:
  raise ValueError("Unknown dataset name: f{args.dataset_name}")

with torch.no_grad():
  for latent_index, latent_name in latent_dict.items():
    styles_new = styles.repeat(args.sample, 1, 1)
    mix_styles = model.style(torch.randn(args.sample, 512, device=args.device))
    mix_styles[-1] = mix_styles[0]
    mix_styles = args.truncation * mix_styles + (1-args.truncation) * mean_latent.unsqueeze(0)
    mix_styles = mix_styles.unsqueeze(1).repeat(1,model.n_latent,1)
    styles_new[:,latent_index] = mix_styles[:,latent_index]
    styles_new = cubic_spline_interpolate(styles_new, step=args.steps)
    images, segs = generate(model, styles_new, mean_latent=mean_latent, randomize_noise=False, batch_size=args.batch)
    frames = [np.concatenate((img,seg),1) for (img,seg) in zip(images,segs)]
    imageio.mimwrite(f'{args.outdir}/{latent_index:02d}_{latent_name}.mp4', frames, fps=20)
    print(f"{args.outdir}/{latent_index:02d}_{latent_name}.mp4")

The results of editing various parts of the face are as follows. Fine control such as the details of the eyes is realized.

result video


Image synthesis from any image

Finally, any image is inverted into the latent space of the model, and image composition is performed using another model from the generated latent. This article uses this image.

First, get the image and cut out the face part.

%cd /content/SemanticStyleGAN
!rm -rf test_img
!mkdir -p test_img/src test_img/align

!wget -c https://www.pakutaso.com/shared/img/thumb/kuchikomi725_TP_V.jpg \
      -O ./test_img/src/test1.jpg

%cd /content/pulse
!python align_face.py \
  -input_dir /content/SemanticStyleGAN/test_img/src \
  -output_dir /content/SemanticStyleGAN/test_img/align \
  -output_size 512 \
  -seed 12 \

Next, invert to the latent space.

%cd /content/SemanticStyleGAN
!cp visualize/invert.py invert.py

!python invert.py \
  --ckpt pretrained/CelebAMask-HQ-512x512.pt \
  --imgdir test_img/align \
  --outdir results/inversion \
  --size 512

Image synthesis is performed from the generated latent using the Bitmoji model.

%cd /content/SemanticStyleGAN

args = argparse.ArgumentParser()

args.ckpt = './pretrained/BitMoji-512x512.pt'
args.latent = './results/inversion/latent/test1_0.npy'
args.outdir = './results/style_BitMoji'
args.truncation = 0.7
args.truncation_mean = 10000
args.device = 'cuda'

if os.path.exists(args.outdir):
  shutil.rmtree(args.outdir)
os.makedirs(args.outdir)

print("Loading model ...")
ckpt = torch.load(args.ckpt)
model = make_model(ckpt['args'])
model.to(args.device)
model.eval()
model.load_state_dict(ckpt['g_ema'])
mean_latent = model.style(torch.randn(args.truncation_mean, model.style_dim, device=args.device)).mean(0)

print("Generating original image ...")
with torch.no_grad():
  if args.latent is None:
    styles = model.style(torch.randn(1, model.style_dim, device=args.device))
    styles = args.truncation * styles + (1-args.truncation) * mean_latent.unsqueeze(0)
  else:
    styles = torch.tensor(np.load(args.latent), device=args.device)
  if styles.ndim == 2:
    assert styles.size(1) == model.style_dim
    styles = styles.unsqueeze(1).repeat(1, model.n_latent, 1)
  images, segs = generate(model, styles, mean_latent=mean_latent, randomize_noise=False)
  imageio.imwrite(f'{args.outdir}/image.jpeg', images[0])
  imageio.imwrite(f'{args.outdir}/seg.jpeg', segs[0])

The output result is as follows.

BitMoji

Although it is a little far from the original image with Invert, it seems that BitMoji's image synthesis is generated well.

Summary

In this article, image synthesis and image editing were performed using Semantic StyleGAN. The Semantic area can be used to edit details down to the eyes and neck, making it even more difficult to distinguish from the original image.

References

1.  Paper - SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing

2. GitHub - seasonSH/SemanticStyleGAN

AIで副業ならココから!

まずは無料会員登録

プロフィール

メーカーで研究開発を行う現役エンジニア
組み込み機器開発や機会学習モデル開発に従事しています

本ブログでは最新AI技術を中心にソースコード付きでご紹介します


Twitter

カテゴリ

このブログを検索

ブログ アーカイブ

TeDokology