Skip to content

BbeumbungE/Backend-AI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

28 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

์•„์ด์บ”๋ฒ„์Šค [Backend-AI]

๐Ÿ’ก ์•„์ด๋“ค์ด ์Šค์ผ€์น˜๋ฅผ ๊ทธ๋ฆฌ๋ฉด ์ƒ์„ฑํ˜• AI๊ฐ€ ์ด๋ฅผ ๊ทธ๋ฆผ์œผ๋กœ ๋ณ€ํ™˜ํ•ด ์ฃผ๋Š” ์„œ๋น„์Šค์ž…๋‹ˆ๋‹ค. ๊ฒŒ์ž„์ ์ธ ์š”์†Œ๋ฅผ ๋”ํ•ด ์•„์ด๋“ค์ด ์Šค์Šค๋กœ ์ž์‹ ๋งŒ์˜ ์ฝ˜ํ…์ธ ๋ฅผ ์ƒ์„ฑํ•˜๋ฉด์„œ ์ž์‹ ๊ฐ๊ณผ ์ฐฝ์˜๋ ฅ์„ ์ฆ์ง„ํ•˜๊ณ , ์žฌ๋ฏธ์™€ ์„ฑ์ทจ๊ฐ์„ ๊ฒฝํ—˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•ฉ๋‹ˆ๋‹ค.



์„œ๋น„์Šค ์˜ˆ์‹œ ํ™”๋ฉด

  • Gamification & ์•„์ด๋“ค ์นœํ™”์ ์ธ UI
๊ทธ๋ฆผ ๋ณ€ํ™˜ ์ฃผ์ œ๋ณ„ ๊ทธ๋ฆฌ๊ธฐ
๊ทธ๋ฆผ ๋ณ€ํ™˜ ๊ธฐ๋Šฅ ์ฃผ์ œ๋ณ„ ๊ทธ๋ฆฌ๊ธฐ ๊ธฐ๋Šฅ

  • ์•„์ด๋“ค์˜ ํฅ๋ฏธ๋ฅผ ์œ ๋ฐœํ•˜๊ธฐ ์œ„ํ•œ ๊ฒŒ์ž„์  ์š”์†Œ
๋‹จ๊ณ„๋ณ„ ๊ทธ๋ฆฌ๊ธฐ ์ฑ„์ 
๋‹จ๊ณ„๋ณ„ ๊ทธ๋ฆฌ๊ธฐ ๊ธฐ๋Šฅ ์ฑ„์  ๊ธฐ๋Šฅ

  • ๊ทธ๋ฆฐ ๊ทธ๋ฆผ์„ ๊ฒŒ์‹œํ•˜๊ณ  ์ข‹์•„์š”๋ฅผ ๋ฐ›์•„์„œ ๋žญํ‚น์— ๋“ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
๋žญํ‚น ๊ทธ๋ฆผ ์กฐํšŒ ์ฃผ์ œ๋ณ„ ๊ทธ๋ฆผ ์กฐํšŒ ์•Œ๋ฆผ ๊ธฐ๋Šฅ
๋žญํ‚น ๊ทธ๋ฆผ ์กฐํšŒ ์ฃผ์ œ๋ณ„ ๊ทธ๋ฆผ ์กฐํšŒ ์•Œ๋ฆผ ๊ธฐ๋Šฅ

  • ๋˜ํ•œ ์ด๋ฅผ ์นด์นด์˜คํ†ก์œผ๋กœ๋„ ๊ณต์œ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
์นด์นด์˜คํ†ก ๊ณต์œ ํ•˜๊ธฐ

AI ํŒŒํŠธ์˜ ๋ชฉํ‘œ

๐Ÿšฉ ๊ถ๊ทน์ ์ธ ๋ชฉํ‘œ : ์„œ๋น„์Šค์˜ ํ•ต์‹ฌ ๊ธฐ๋Šฅ ์ค‘ ํ•˜๋‚˜์ธ ๊ทธ๋ฆฐ ์Šค์ผ€์น˜๋ฅผ ํ•ด๋‹น ๊ทธ๋ฆผ์œผ๋กœ ๋ณ€ํ™˜ํ•ด์ฃผ๋Š” generator ํ•™์Šต

  • ์ด๋ฅผ ๋‹ฌ์„ฑํ•˜๊ธฐ ์œ„ํ•ด์„œ ํ•ด์•ผํ•  ์ผ
    • ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘
    • ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ
    • ๋ชจ๋ธ ํ•™์Šต
    • ๋ชจ๋ธ ๋น„๊ต & checkpoint ๊ฒฐ์ •
    • FastAPI ์„œ๋ฒ„์— ๋ชจ๋ธ ๋„์šฐ๊ธฐ

๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ

  • ๋งŽ์€ ๊ฒฝ์šฐ์— ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์™€ ํ•ด๋‹นํ•˜๋Š” edge๊ฐ€ ํ•จ๊ป˜ ์ œ๊ณต๋˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค.
  • ์ด๋ฏธ์ง€ ์ˆ˜๊ฐ€ ๋งŽ๊ธฐ์— ์ด๋ฅผ ์ง์ ‘ ๊ทธ๋ฆฌ๋Š” ๊ฒƒ์€ ํ˜„์‹ค์ ์œผ๋กœ ์–ด๋ ค์šฐ๋‹ˆ, pix2pix ์ €์ž๋“ค์˜ implementation์„ ์ฐธ๊ณ , HED(Holistically-Nested Edge Detection)๋กœ edge๋ฅผ ์ถ”์ถœํ•œ ๋’ค, post-processing ์ž‘์—…์„ ๊ฑฐ์ณค์Šต๋‹ˆ๋‹ค.
    • pix2pix github์˜ Extracting Edges Section์„ ์ฐธ๊ณ ํ•ด์ฃผ์„ธ์š”

  • ๊ฒฐ๋ก ์ ์œผ๋กœ ์ €๋ ‡๊ฒŒ ์ด์–ด๋ถ™์—ฌ์„œ ํ•™์Šต ์‹œ์— ๋ถˆ๋Ÿฌ์˜ค๊ฒŒ ๋ฉ๋‹ˆ๋‹ค
  • ์˜ˆ์‹œ ์ด๋ฏธ์ง€ : DVM Car Dataset, bmw series 5 combined bmw

โ€ป Distribution Mismatch

  • ํ•™์Šต์„ ์‹œํ‚ค๋Š” ๋ฐ์ดํ„ฐ๋Š” HED์— ์˜ํ•ด ์ž๋™์œผ๋กœ ์ถ”์ถœ๋œ edge์ธ๋ฐ, ์‹ค์ œ ์‚ฌ์šฉ์ž๊ฐ€ ์ด๋ฅผ ๋”ฐ๋ผ ๊ทธ๋ฆฌ๊ธฐ๋Š” ํ˜„์‹ค์ ์œผ๋กœ ์–ด๋ ต์Šต๋‹ˆ๋‹ค.
  • ๊ทธ๋ž˜์„œ ํ•™์Šต์‹œํ‚ค๋Š” ๋ฐ์ดํ„ฐ์™€ ๊ถ๊ทน์ ์œผ๋กœ ์ ์šฉํ•˜๊ณ ์ž ํ•˜๋Š” ๋ฐ์ดํ„ฐ์— distribution mismatch๊ฐ€ ๋ฐœ์ƒํ•˜๋Š”๋ฐ, ์ผ๋ฐ˜์ ์œผ๋กœ ์ด๋Š” ์‹ค์ œ ์ ์šฉ ์‹œ์—์„œ์˜ ์„ฑ๋Šฅ ์ €ํ•˜๋ฅผ ์ดˆ๋ž˜ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ์งš๊ณ  ๋„˜์–ด๊ฐ€์•ผ ํ•ฉ๋‹ˆ๋‹ค.

  • ๊ทธ๋ ‡๋‹ค ํ•˜๋”๋ผ๋„ ์ €ํฌ๊ฐ€ ๊ถ๊ทน์ ์œผ๋กœ ์›ํ•˜๋Š” ๊ฒƒ์€ ์œ ์ €๊ฐ€ ๊ทธ๋ฆฐ edge๋ฅผ ์ž˜ ๋ณ€ํ™˜ํ•ด์ฃผ๋Š” generator์ด๊ธฐ ๋•Œ๋ฌธ์— ์ €ํฌ๋Š” ์ง์ ‘ edge๋ฅผ ๊ทธ๋ ค์„œ ์œ ์ €๊ฐ€ ์ž ์žฌ์ ์œผ๋กœ ๊ทธ๋ฆด ๋งŒํ•œ, ๊ทธ๋ฆด ์ˆ˜ ์žˆ๋Š” ์ˆ˜์ค€์˜ edge๋ฅผ ๊ธฐ์ค€์œผ๋กœ ๊ฒฐ๊ณผ๋ฅผ ๋น„๊ตํ•ด์„œ ๋ชจ๋ธ์„ ์„ ์ •ํ–ˆ์Šต๋‹ˆ๋‹ค.
HED์— ์˜ํ•ด ์ž๋™์œผ๋กœ ์ถ”์ถœ๋œ edge ์ง์ ‘ ๊ทธ๋ ค๋ณธ edge
edge_hed edge_drawn
  • ์ €์ž๋“ค๋„ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ์ด๋ฅผ ์œ ๋…ํ•˜์—ฌ, paper ๋ถ€๋ก์— ํ•™์Šต ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ์‚ฌ๋žŒ์ด ๊ทธ๋ฆฐ ๋ฐ์ดํ„ฐ์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ๋„ ํ•จ๊ป˜ ์ฒจ๋ถ€ํ–ˆ์Šต๋‹ˆ๋‹ค.

๋ชจ๋ธ ์†Œ๊ฐœ

  • pix2pix๋Š” conditional GAN(Generative Adversarial Network)๋ฅผ ์ด์šฉํ•ด์„œ (paired) Image-to-Image Translation ๋ฌธ์ œ์— ์ ‘๊ทผํ•ฉ๋‹ˆ๋‹ค

๋ฐฐ๊ฒฝ ์ง€์‹

  • (paired) Image-to-Image Translation์ด๋ž€ ๋ง ๊ทธ๋Œ€๋กœ, ์–ด๋–ค ํ•œ ์ด๋ฏธ์ง€๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ ์ด๋ฅผ ๋Œ€์‘ํ•˜๋Š” ํ•œ ์ด๋ฏธ์ง€๋กœ ๋ฐ”๊พธ๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค
  • ์—ฌ๊ธฐ์„œ paired์˜ ์˜๋ฏธ๋Š” ์–ด๋–ค ๊ตฌ์กฐ๋ฅผ ๊ณต์œ ํ•˜๋Š”, (input์œผ๋กœ output์„ ์–ด๋Š ์ •๋„ ์„ค๋ช… ๊ฐ€๋Šฅํ•œ) (input, output) pair๊ฐ€ ์žˆ๋Š” ํ™˜๊ฒฝ์„ ๋งํ•ฉ๋‹ˆ๋‹ค
  • ์˜ˆ์‹œ ์ ์šฉ ์‚ฌ๋ก€ pix2pix ์˜ˆ์ œ๋“ค

  • GAN์ด๋ž€ Generative Adversarial Network์˜ ์•ฝ์ž๋กœ, generator์™€ discriminator ๋‘๊ฐœ์˜ neural network๋กœ ์ด๋ฃจ์–ด์ ธ ์žˆ๋Š”๋ฐ, Generative : ๋ญ”๊ฐ€๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š”, Adversarial : generator์™€ discriminator๊ฐ€ ๋ญ”๊ฐ€ ์„œ๋กœ ๊ฒฝ์Ÿํ•œ๋‹ค๋Š”(or ๋„์›€์„ ์ฃผ๋Š”) ๋œป์„ ๋‹ด๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค
    • generator์˜ ๋ชฉ์ ์€ ์‚ฌ์‹ค์ ์ธ ๋ฐ์ดํ„ฐ(image, audio ๋“ฑ)๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค
      • ์—ฌ๊ธฐ์„œ ์‚ฌ์‹ค์ ์ด๋ผ ํ•จ์€, (discriminator๊ฐ€) ์‹ค์ œ ๋ฐ์ดํ„ฐ์™€ ๊ตฌ๋ณ„ํ•˜๊ธฐ ์–ด๋ ค์šด ๊ฒƒ์„ ๋งํ•ฉ๋‹ˆ๋‹ค
    • discriminator์˜ ๋ชฉ์ ์€ ์–ด๋–ค ๋ฐ์ดํ„ฐ(image, audio ๋“ฑ)๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, ์ด๊ฒƒ์ด generator๊ฐ€ ๋งŒ๋“ค์–ด๋‚ธ fake ๋ฐ์ดํ„ฐ์ธ์ง€, ํ˜น์€ real ๋ฐ์ดํ„ฐ์ธ์ง€ ๊ตฌ๋ณ„ํ•ด๋‚ด๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค

์†Œ๊ฐœ

  • pix2pix๋Š” input image๊ฐ€ ์ฃผ์–ด์ง€๋ฉด ํ•ด๋‹นํ•˜๋Š” ํƒ€๊ฒŸ์˜ output image๋กœ ๋ฐ”๊ฟ”์ฃผ๋Š” paired Image-to-Image Translation Task๋ฅผ ์œ„ํ•œ ๋ชจ๋ธ์ž…๋‹ˆ๋‹ค
  • ๋งŽ์€ ์ด์ „์˜ GAN์ด noise๋ฅผ input์œผ๋กœ ์ฃผ๋ฉด output์„ ๋ฐ˜ํ™˜ํ•˜๋Š” ๊ฒƒ์— ๋น„ํ•ด, pix2pix์—์„œ๋Š” input์œผ๋กœ condition(๋ณ€ํ™˜ํ•˜๊ณ ์ž ํ•˜๋Š” ์ด๋ฏธ์ง€)์„ ์ฃผ๊ณ  ๋ณ„๋„์˜ noise๋Š” ์ฃผ์ง€ ์•Š์Šต๋‹ˆ๋‹ค
    • ๊ทธ๋ž˜์„œ condition์ด ๋™์ผํ•œ ํ•œ generator๋Š” deterministicํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค
      • ์ฆ‰, ๊ฐ™์€ condition์ด ์ฃผ์–ด์ง€๋ฉด ๊ฐ™์€ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๊ฒŒ ๋ฉ๋‹ˆ๋‹ค
  • ์ €์ž๋“ค๋„ ์ฒ˜์Œ์—๋Š” noise๋„ ๊ฐ™์ด ์ฃผ๋Š” ๋ฐฉํ–ฅ์„ ๊ณ ๋ คํ–ˆ์ง€๋งŒ ๊ทธ๋ฆฌ ํšจ๊ณผ์ ์ด์ง€ ์•Š์•„์„œ ์ œ์™ธํ–ˆ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค
    • generator๊ฐ€ noise๋ฅผ ๋ฌด์‹œํ•˜๋Š” ์ชฝ์œผ๋กœ ํ•™์Šตํ•˜๋Š” ๊ฒฝํ–ฅ์„ ๋ณด์˜€๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค

Past conditional GANs have acknowledged this and provided Gaussian noise z as an input to the generator, in addition to x (e.g., [55]). In initial experiments, we did not find this strategy effective โ€“ the generator simply learned to ignore the noise โ€“ which is consistent with Mathieu et al. (์ถœ์ฒ˜ : pix2pix paper)


Generator generator ์‚ฌ์ง„

  • input : (H, W, 1) image tensor(ํ‘๋ฐฑ), ๋ฒ”์œ„ : [0, 1]
  • output : (H, W, 3) image tensor(์ปฌ๋Ÿฌ), ๋ฒ”์œ„ : [-1, 1]
  • ์ €์ž๋“ค์ด ์ด ๋…ผ๋ฌธ์„ ๋ฐœํ‘œํ•  ๋•Œ๋Š” U-net ๊ตฌ์กฐ๋ฅผ ์‚ฌ์šฉํ–ˆ์ง€๋งŒ, ์ €ํฌ๋Š” ์ดํ›„ CycleGAN์—์„œ ์‚ฌ์šฉํ–ˆ๋˜ Resnet ๊ธฐ๋ฐ˜์˜ generator๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค
  • conv + 2 Contracting Blocks + 9 Residual Blocks + 2 Expanding Blocks + conv
    • c7s1-64, d128, d256, R256 * 9, u128, u64, c7s1-3 (CycleGAN ์ €์ž๋“ค์˜ Notation ์ฐธ๊ณ )
    • Contracting Block : conv + instance_norm โ‡’ width & height๋ฅผ ์ ˆ๋ฐ˜์œผ๋กœ ์ค„์ž…๋‹ˆ๋‹ค
    • Residual Block : conv + instance_norm + relu + conv + instance_norm + input๊ณผ์˜ skip_connection โ‡’ width & height๋ฅผ ๊ทธ๋Œ€๋กœ ์œ ์ง€ํ•ฉ๋‹ˆ๋‹ค
    • Expanding Block : transposed_conv + instance_norm โ‡’ width & height๋ฅผ 2๋ฐฐ๋กœ ๋Š˜๋ฆฝ๋‹ˆ๋‹ค
    • padding : reflection_pad
    • ๊ฒฐ๋ก ์ ์œผ๋กœ input๊ณผ output์˜ ์ด๋ฏธ์ง€ ํฌ๊ธฐ๋Š” ๋™์ผํ•ฉ๋‹ˆ๋‹ค
  • ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜ : 11,377,155๊ฐœ (condition์ด ํ‘๋ฐฑ edge channel ํ•˜๋‚˜์ผ ๊ฒฝ์šฐ)

Discriminator discriminator ์‚ฌ์ง„

  • input : (H, W, 4) image tensor(real or fake image + condition(์šฐ๋ฆฌ์˜ ๊ฒฝ์šฐ edge)), ๋ฒ”์œ„ : [-1, 1]

    • ํƒ€๊ฒŸ ์ด๋ฏธ์ง€ ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ, ๊ทธ ํƒ€๊ฒŸ์„ ๋งŒ๋“ค์–ด๋‚ด๊ธฐ ์œ„ํ•œ condition์„ ํƒ€๊ฒŸ ์ด๋ฏธ์ง€์˜ ์ฑ„๋„ ์ถ•์— ๋ถ™์ž…๋‹ˆ๋‹ค
    • ์ด๋Š” ๊ธฐ์กด์˜ conditional GAN์—์„œ ๊ทธ๋žฌ๋“ฏ, ๋‹จ์ˆœํžˆ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€๋งŒ ๊ฐ€์ง€๊ณ  ๊ทธ ์ด๋ฏธ์ง€๊ฐ€ ์ง„์งœ์ธ์ง€ ์•„๋‹Œ์ง€ ๊ตฌ๋ณ„ํ•˜๋Š” ๊ฒƒ๋ณด๋‹ค, ๊ทธ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€๊ฐ€ ์ € ์ปจ๋””์…˜์œผ๋กœ๋ถ€ํ„ฐ ๋‚˜์™”์„ ๋•Œ ์ง„์งœ์ธ์ง€ ๊ฐ€์งœ์ธ์ง€ ๊ตฌ๋ณ„ํ•˜๋Š” ๊ฒƒ์ด ์„ฑ๋Šฅ์— ๋” ์ข‹์•˜๊ธฐ์— ๊ทธ๋žฌ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค
  • output : patchGAN output tensor, ๋ฒ”์œ„ : ์ œํ•œ ์—†์Œ, but 0์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก discriminator๋Š” ๊ฐ€์งœ๋กœ ํŒ๋‹จํ•˜๋Š” ๊ฒƒ์ด๊ณ , 1์— ๊ฐ€๊นŒ์šธ์ˆ˜๋ก ์ง„์งœ๋กœ ํŒ๋‹จํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค

  • discriminator๋กœ๋Š” PatchGAN discriminator์„ ์‚ฌ์šฉํ•ฉ๋‹ˆ๋‹ค

  • ์ด์ „์˜ ๋งŽ์€ GAN์ด ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ํ•œ๋ฒˆ์— ๋ณด๊ณ , ์ด๊ฒŒ real์ธ์ง€ fake์ธ์ง€ ๊ตฌ๋ถ„ํ–ˆ๋‹ค๋ฉด, PatchGAN discriminator๋Š” ์ด๋ฏธ์ง€ ์ „์ฒด๋ฅผ ํ•œ๋ฒˆ์— ๋ณด์ง€ ์•Š๊ณ , ๊ฐ๊ฐ ํ•ด๋‹นํ•˜๋Š” ์ด๋ฏธ์ง€ Patch ๋ณ„๋กœ ๊ทธ ๋ถ€๋ถ„์ด ์‚ฌ์‹ค์ ์ธ๊ฐ€(real distribution๊ณผ discriminator๊ฐ€ ๊ตฌ๋ถ„ํ•  ์ˆ˜ ์—†๋Š”๊ฐ€) ์•„๋‹Œ๊ฐ€(fake)๋ฅผ ํŒ๋‹จํ•ฉ๋‹ˆ๋‹ค

    • ๊ทธ๋ฆฌ๊ณ  ์ถ”ํ›„์— ์‚ดํŽด๋ณผ loss์—์„œ ์ด ์ •๋ณด๋“ค์„ ์ทจํ•ฉํ•ฉ๋‹ˆ๋‹ค
    • ์—ฌ๊ธฐ์„œ ์ฃผ์˜ํ•  ์ ์€, ์ด๋ฏธ์ง€๋ฅผ ์—ฌ๋Ÿฌ ๊ฐœ์˜ patch๋กœ ์ž˜๋ผ์„œ ํ•˜๋‚˜ํ•˜๋‚˜ ๋„ฃ๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ, ๋ชจ๋‘ ํ•œ๋ฒˆ์— convolution์˜ ์„ฑ์งˆ์„ ์ด์šฉํ•ด์„œ ์ง„ํ–‰ํ•˜๊ฒŒ ๋ฉ๋‹ˆ๋‹ค
      • output์˜ ๊ฐ cell๋“ค์ด ๋ณด๋Š” patch๋Š” ์„œ๋กœ ๊ฒน์น  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
  • PatchGAN์€ ๊ธฐ๋ณธ์ ์œผ๋กœ receptive field size ๋ณด๋‹ค ๋” ๋ฉ€๋ฆฌ ์žˆ๋Š” ํ”ฝ์…€๋“ค์€ ์„œ๋กœ ๋…๋ฆฝ์ ์ด๋ผ๊ณ  ๊ฐ€์ •์„ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋‹ค๋ฅธ ๋ง๋กœ Markovian Discriminator, Local-patch Disctiminator๋ผ๊ณ ๋„ ๋ถˆ๋ฆฐ๋‹ค๊ณ  ํ•ฉ๋‹ˆ๋‹ค

  • ์•„๋ž˜๋Š” ์ €์ž๋“ค์ด ํ…Œ์ŠคํŠธ ํ•ด๋ณธ ์—ฌ๋Ÿฌ๊ฐ€์ง€ receptive field size(discriminator์˜ output์ค‘ ํ•œ cell์ด ๋ณด๋Š” ์ž…๋ ฅ ์ด๋ฏธ์ง€ patch ํฌ๊ธฐ, in pixel)์— ๋Œ€ํ•œ ๊ฒฐ๊ณผ ์˜ˆ์‹œ์ž…๋‹ˆ๋‹ค receptive field size

  • ์ €ํฌ๋Š” ์ €์ž๋“ค์˜ ์„ ํƒ์„ ์ฐธ๊ณ ํ•ด์„œ 70์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค

  • C64 - C128 - C256 - C512 - output layer (CycleGAN ์ €์ž๋“ค์˜ notation ์ฐธ๊ณ )

  • ํŒŒ๋ผ๋ฏธํ„ฐ ์ˆ˜ : 2,765,633๊ฐœ (condition์ด ํ‘๋ฐฑ edge channel ํ•˜๋‚˜์ผ ๊ฒฝ์šฐ)


loss - pix2pix๋Š” ๋‹ค์Œ objective๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค

pix2pix total objective

  • ์—ฌ๊ธฐ์„œ G๋Š” generator์ด๊ณ , D๋Š” discriminator์ž…๋‹ˆ๋‹ค
  • ๊ตฌ์„ฑํ•˜๋Š” ๊ฒƒ์„ ๋‘๊ฐ€์ง€๋กœ ๋‚˜๋ˆ ๋ณด๋ฉด ํฌ๊ฒŒ L_cGAN๊ณผ L_L1์œผ๋กœ ๋‚˜๋ˆŒ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
    • L_cGAN

      • ์ด๋Š” conditional gan์—์„œ ์‚ฌ์šฉํ•˜๋˜ loss์™€ ๋™์ผํ•ฉ๋‹ˆ๋‹ค
      • ๋‹จ, ์ €ํฌ๋Š” cross entropy loss๋Œ€์‹ ์— LSGAN์—์„œ ์‚ฌ์šฉ๋˜์—ˆ๋˜ least square adversarial loss๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค
        • ์ฆ‰, generator๋Š” input condition x(์šฐ๋ฆฌ์˜ ๊ฒฝ์šฐ sketch)๋ฅผ ๋„ฃ์–ด์„œ ์ƒ์„ฑ๋œ ๊ฒฐ๊ณผ G(x)๊ฐ€ discriminator์—๊ฒŒ ์‚ฌ์‹ค์ ์ธ ์ด๋ฏธ์ง€์ฒ˜๋Ÿผ ๋ณด์ด๋„๋ก ํ•™์Šตํ•˜๊ณ 
        • discriminator๋Š” y(์‹ค์ œ target)์€ 1๋กœ, ๊ฐ€์งœ(G(x))๋Š” 0์ด ๋˜๋„๋ก ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค
    • L_L1 loss

      • input์„ generator์— ๋„ฃ์–ด์„œ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€์™€ ์›ํ•˜๋Š” target ์‚ฌ์ด์˜ pixel level์—์„œ์˜ L1 distance๋ฅผ ๊ณ„์‚ฐํ•ฉ๋‹ˆ๋‹ค

      pix2pix l1 loss

      • ์‰ฝ๊ฒŒ ๋งํ•˜์ž๋ฉด, model์˜ output๊ณผ target(์ด์ƒ์ ์ธ ๊ฒฐ๊ณผ)์˜ ํ”ฝ์…€๊ฐ„์˜ ์ ˆ๋Œ“๊ฐ’ ์ฐจ์ด(RGB ๋ชจ๋‘)๋ฅผ ๋ชจ๋‘ ๊ตฌํ•œ ํ›„ ์ด๋ฅผ ํ‰๊ท ๋‚ด๋ฉด ๋ฉ๋‹ˆ๋‹ค
      • ์ €์ž๋“ค์€ ์ „์ฒด์ ์œผ๋กœ ๋น„๊ต์  blurryํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“œ๋Š” L2(์ ˆ๋Œ“๊ฐ’ ๋Œ€์‹  ์ œ๊ณฑํ•ฉ์„ ์‚ฌ์šฉ) ๋Œ€์‹  L1 distance๋ฅผ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค
      • ์šฐ๋ฆฌ๊ฐ€ ๊ถ๊ทน์ ์œผ๋กœ ์›ํ•˜๋Š” ๊ฒƒ์€ input์„ target์œผ๋กœ ์ž˜ ๋ฐ”๊ฟ”์ฃผ๋Š” generator์ด๊ธฐ ๋•Œ๋ฌธ์— ์ž์—ฐ์Šค๋Ÿฌ์šด ์„ ํƒ์œผ๋กœ ๋ฐ›์•„๋“ค์ผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค
def discriminator_loss_function(real_D_out, fake_D_out):
    '''
    LSGAN loss

    <params>
        real_D_out : ์‹ค์ œ ์ด๋ฏธ์ง€๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, discriminator์˜ ๊ฒฐ๊ณผ๊ฐ’
        fake_D_out : ๊ฐ€์งœ ์ด๋ฏธ์ง€๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, discirminator์˜ ๊ฒฐ๊ณผ๊ฐ’
    '''

    # ์ €์ž๋“ค์˜ ๋ฐฉ์‹์— ๋”ฐ๋ผ, 2๋กœ ๋‚˜๋ˆ”์œผ๋กœ์จ D๊ฐ€ ๋ฐฐ์šฐ๋Š” ์†๋„๋ฅผ ๋Šฆ์ถ˜๋‹ค (G๊ฐ€ Generator Adversarial Loss๋กœ๋ถ€ํ„ฐ ๋ฐฐ์šฐ๋Š” ๊ฒƒ์— ๋น„ํ•ด์„œ)
    return 0.5 * (tf.math.reduce_mean(tf.math.squared_difference(real_D_out, tf.ones_like(real_D_out))) +
                  tf.math.reduce_mean(tf.math.squared_difference(fake_D_out, tf.zeros_like(fake_D_out))))
def generator_adversarial_loss_function(fake_D_out):
    '''
    LSGAN loss

    <params>
        fake_D_out : ๊ฐ€์งœ ์ด๋ฏธ์ง€๊ฐ€ ์ฃผ์–ด์กŒ์„ ๋•Œ, discirminator์˜ ๊ฒฐ๊ณผ๊ฐ’
    '''

    return tf.math.reduce_mean(tf.math.squared_difference(fake_D_out, tf.ones_like(fake_D_out)))
def generator_L1_loss_function(real_images, fake_images):
    '''
    L1 loss

    <params>
        real_images : ์‹ค์ œ ์ด๋ฏธ์ง€
        fake_images : generator์— ์˜ํ•ด์„œ ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€
    '''

    return tf.math.reduce_mean(tf.math.abs(real_images - fake_images))
# discriminator loss ๊ณ„์‚ฐ
discriminator_loss = discriminator_loss_function(real_D_out, fake_D_out)

# generator loss ๊ณ„์‚ฐ
generator_adversarial_loss = generator_adversarial_loss_function(fake_D_out)
generator_L1_loss = generator_L1_loss_function(real_image, fake_image)

generator_loss = generator_adversarial_loss + LAMBDA * generator_L1_loss
  • L1 loss๋ฅผ ๊ตฌํ•˜๋Š” ๋ฐ๋Š” discriminator๋ฅผ ์ด์šฉํ•˜์ง€ ์•Š์œผ๋ฏ€๋กœ
    • discriminator_loss = LSGAN_discriminator_loss
    • generator_loss = LSGAN_generator_loss + lambda * L1_loss
      • lambda๋Š” 10์„ ์‚ฌ์šฉํ–ˆ์Šต๋‹ˆ๋‹ค

  • ์‹ค์ œ ํ•™์Šต์€ discriminator์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ discriminator_loss๋ฅผ ๋‚ฎ์ถ”๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•œ๋ฒˆ ์—…๋ฐ์ดํŠธ ํ•˜๊ณ , ๊ทธ๋ฆฌ๊ณ  ๊ทธ ๋‹ค์Œ์— generator์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ generator_loss๋ฅผ ๋‚ฎ์ถ”๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•œ๋ฒˆ ์—…๋ฐ์ดํŠธ ํ•˜๋Š” ๊ณผ์ •์„ ๋ฐ˜๋ณตํ•ฉ๋‹ˆ๋‹ค
  • ๋‘ ๋ชจ๋ธ์˜ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ๋™์‹œ์— ํ•™์Šตํ•˜์ง€ ์•Š๋Š” ๊ฒƒ์ด ์ค‘์š”ํ•ฉ๋‹ˆ๋‹ค (๋”ฐ๋กœ๋”ฐ๋กœ ๋ฒˆ๊ฐˆ์•„๊ฐ€๋ฉฐ ํ•ด์•ผํ•ฉ๋‹ˆ๋‹ค)
    • ๊ทธ ์ด์œ ๋Š” generator(์ž˜ ๋งŒ๋“ค์–ด๋‚ด์ž)์˜ ์—ญํ• ๊ณผ discriminator(์ž˜ ๊ตฌ๋ณ„ํ•˜์ž)์˜ ์—ญํ• ์ด ์–ด์ฐŒ ๋ณด์ž๋ฉด ์„œ๋กœ ์ƒ๋ฐ˜๋˜๋Š”๋ฐ, ๋‘๊ฐœ์˜ parameter๋ฅผ ๋™์‹œ์— ํ•™์Šตํ•œ๋‹ค๋ฉด ์„œ๋กœ ์ ˆ์ถฉ, ํƒ€ํ˜‘ํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์ž…๋‹ˆ๋‹ค
    • discriminator_loss๋ฅผ ๋‚ฎ์ถ˜๋‹ค ํ•จ์€ LSGAN_discriminator_loss๋ฅผ ๋‚ฎ์ถ”๋Š” ๋ฐฉํ–ฅ, ์ฆ‰ ๊ฐ€์งœ ์ด๋ฏธ์ง€๋Š” 0์œผ๋กœ ์˜ˆ์ธกํ•˜๋ ค๊ณ  ํ•˜๊ณ , ์ง„์งœ ์ด๋ฏธ์ง€๋Š” 1๋กœ ์˜ˆ์ธกํ•˜๋ ค๊ณ  ํ•˜๋Š”, ์ด๋ฏธ์ง€๋ฅผ ์ž˜ ๊ตฌ๋ณ„ํ•˜๋ ค๊ณ  ํ•™์Šตํ•˜๋Š” ๊ณผ์ •์ž…๋‹ˆ๋‹ค
    • ๊ทธ๋ฆฌ๊ณ  ๊ทธ ๋’ค์— generator_loss๋ฅผ ๋‚ฎ์ถ˜๋‹ค ํ•จ์€ LSGAN_generator_loss + lambda * L1_loss๋ฅผ ๋‚ฎ์ถ”๋Š” ๋ฐฉํ–ฅ์ธ๋ฐ
      • LSGAN_generator_loss๋ฅผ ๋‹ค์‹œ ์‚ดํŽด๋ณด๋ฉด G(x), ์ฆ‰ ๊ฐ€์งœ์ด๋ฏธ์ง€๊ฐ€ discriminator์—๊ฒŒ 1(์ง„์งœ์ฒ˜๋Ÿผ ๋ณด์ด๋„๋ก)์— ๊ฐ€๊น๋„๋ก ํ•™์Šตํ•ฉ๋‹ˆ๋‹ค

        lsgan generator loss

      • ๊ถ๊ทน์ ์œผ๋กœ D๋Š” ์‚ฌ์šฉํ•˜์ง€ ์•Š๊ณ  ์ข‹์€ G๋ฅผ ์–ป์–ด๋‚ด๋Š” ๊ฒƒ์ด ๋ชฉ์ ์ธ๋ฐ, D๊ฐ€ ์™œ ํ•„์š”ํ•˜๋ƒ?์˜ ๋Œ€๋‹ต์„ ์—ฌ๊ธฐ์„œ ํ•  ์ˆ˜ ์žˆ๋‹ค๊ณ  ์ƒ๊ฐํ•ฉ๋‹ˆ๋‹ค

      • D๋Š” ์ด์ „์— ๊ฐ€์งœ(G(x))์™€ ์ง„์งœ(y)๋ฅผ ๊ตฌ๋ณ„ํ•˜๋ ค๊ณ  ํ•™์Šตํ–ˆ๊ธฐ ๋•Œ๋ฌธ์— (๋น„๋ก ํ•œ๋ฒˆ์˜ step์ด์ง€๋งŒ, ์กฐ๊ธˆ์ด๋ผ๋„, ๊ณ„์† ๋ˆ„์  ๋œ๋‹ค๋ฉด) ๊ฐ€์งœ์™€ ์ง„์งœ๊ฐ€ ์–ด๋–ค ๋ถ€๋ถ„์—์„œ ๋‹ค๋ฅธ์ง€ ์–ด๋Š ์ •๋„ ์•Œ๊ฒŒ ๋ฉ๋‹ˆ๋‹ค

      • ๊ทธ๋ž˜์„œ ์–ด๋–ป๊ฒŒ ๊ตฌ๋ณ„ํ•  ์ˆ˜ ์žˆ์—ˆ๋Š”์ง€ ์ด ์ •๋ณด๋ฅผ G์˜ ํŒŒ๋ผ๋ฏธํ„ฐ์— ์ „๋‹ฌ(D๋Š” ์ด๋•Œ ํ•™์Šตํ•˜์ง€ ์•Š์Šต๋‹ˆ๋‹ค)ํ•ด์„œ G๊ฐ€ ๋” ์ž˜ ๋งŒ๋“ค๊ฒŒ ๋˜๋Š” ๊ฑฐ๋ผ๊ณ  ์ดํ•ดํ•˜๋ฉด ์ข‹์„๊ฑฐ ๊ฐ™์Šต๋‹ˆ๋‹ค

        • ์—ฌ๊ธฐ์„œ ์ „๋‹ฌ์€ ๋ฌผ๋ก  back propagation์„ ํ†ตํ•ด ์ด๋ฃจ์–ด์ง‘๋‹ˆ๋‹ค
    • ์ด๋Ÿฐ ์‹์œผ๋กœ D๋Š” ์ž˜ ๊ตฌ๋ณ„ํ•˜๋ ค๊ณ  ํ•œ๋ฒˆ ๋ฐฐ์šฐ๊ณ , ๊ทธ ๋‹ค์Œ์— ์–ด๋–ป๊ฒŒ ๊ตฌ๋ณ„ํ–ˆ๋Š”์ง€ ๊ทธ ์ •๋ณด๋ฅผ G์—๋„ ์ „๋‹ฌํ•ด์„œ G๋Š” ๋” ์ž˜ ๋งŒ๋“ค๊ฒŒ ๋˜๊ณ , ๋‹ค์‹œ ๋˜ D๋Š” ์ด๊ฑธ ์‹ค์ œ์™€ ๊ตฌ๋ณ„ํ•ด๋ณด๋ ค๊ณ  ๋…ธ๋ ฅํ•˜๊ณ , ๋‹ค์‹œ G์—๊ฒŒ ์–ด๋–ป๊ฒŒ ๊ตฌ๋ณ„ํ–ˆ๋Š”์ง€ ์ •๋ณด๋ฅผ ์ „๋‹ฌํ•˜๊ณ  ์ด๋Ÿฐ ์‹์œผ๋กœ GAN์€ ํ•™์Šตํ•˜๋Š” ๊ฒƒ์ž…๋‹ˆ๋‹ค

์‚ฌ์šฉํ•œ ๋ฐ์ดํ„ฐ

Cartoon set

  • ์ถœ์ฒ˜ : ๊ตฌ๊ธ€์˜ cartoon set
  • ๋ฐ์ดํ„ฐ ์ˆ˜ : 9996 (์›๋ณธ 10๋งŒ ๊ฐœ ์ค‘์—์„œ ์ผ๋ถ€๋ฅผ ์ถ”์ถœํ•˜์—ฌ ์ˆ˜ํ–‰)
  • batch ์‚ฌ์ด์ฆˆ : 4
  • ํ•™์Šต์‹œํ‚จ epoch ์ˆ˜ : 28
  • ํŠน์ด์‚ฌํ•ญ
    • ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์œ„ํ•ด์„œ color ์ •๋ณด๋ฅผ condition์œผ๋กœ ์ถ”๊ฐ€ํ•ด ๋ณด๊ธฐ๋„ ํ•˜๊ณ , ๋ฐ์ดํ„ฐ์˜ ์ˆ˜๋ฅผ ๋Š˜๋ ค๋ณด๊ธฐ๋„ ํ•˜์˜€์œผ๋‚˜(10๋งŒ ๊ฐœ, ์›๋ณธ ๋ฐ์ดํ„ฐ ์ „๋ถ€), ์‚ฌ์šฉ์ž๊ฐ€ ๊ทธ๋ฆฐ edge์— ๋Œ€ํ•œ ๋ณ€ํ™˜ ์„ฑ๋Šฅ์— ์ด๋ ‡๋‹ค ํ•  ๊ฐœ์„ ์ ์ด ๋ณด์ด์ง€ ์•Š์Œ (ํ•™์Šต ๋ฐ์ดํ„ฐ๋Š” ๊ต‰์žฅํžˆ ์ž˜ ๋ณ€ํ™˜)
  • ์˜ˆ์‹œ ๊ฒฐ๊ณผ cartoon set ์˜ˆ์‹œ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€

Panda

  • ์ถœ์ฒ˜ : Kaggle, Panda or Bear Image Classification
  • ๋ฐ์ดํ„ฐ ์ˆ˜ : 300 (๊ณฐ ๋ฐ์ดํ„ฐ๋Š” ์ œ์™ธํ•˜๊ณ , ํŒ๋‹ค ๋ฐ์ดํ„ฐ๋งŒ ์‚ฌ์šฉ)
  • batch ์‚ฌ์ด์ฆˆ : 1
  • ํ•™์Šต ์‹œํ‚จ epoch ์ˆ˜ : 180
  • ์˜ˆ์‹œ ๊ฒฐ๊ณผ panda ์˜ˆ์‹œ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€

Car

  • ์ถœ์ฒ˜ : DVM car dataset
  • ๋ฐ์ดํ„ฐ ์ˆ˜ : 11476 (DVM car dataset์—์„œ ์„ธ๋‹จ ํ˜•์˜ bmw series 5 & 7๋งŒ ์ถ”์ถœ)
  • batch ์‚ฌ์ด์ฆˆ : 4
  • ํ•™์Šต ์‹œํ‚จ epoch ์ˆ˜ : 19
  • ์˜ˆ์‹œ ๊ฒฐ๊ณผ car ์˜ˆ์‹œ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€

Handbags

Shoes

Maplestory Characters

  • ์ถœ์ฒ˜ : Kaggle, maplestory_characters_hd
  • ๋ฐ์ดํ„ฐ ์ˆ˜ : 69372
  • batch ์‚ฌ์ด์ฆˆ : 4
  • ํ•™์Šต ์‹œํ‚จ epoch ์ˆ˜ : 14
  • ์˜ˆ์‹œ ๊ฒฐ๊ณผ maplestory character ์˜ˆ์‹œ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€

Gemstone

  • ์ถœ์ฒ˜ : Kaggle, Gemstones Images
  • ๋ฐ์ดํ„ฐ ์ˆ˜ : 3219
  • batch ์‚ฌ์ด์ฆˆ : 4
  • ํ•™์Šต ์‹œํ‚จ epoch ์ˆ˜ : ๋Œ€๋žต 36
  • ์˜ˆ์‹œ ๊ฒฐ๊ณผ gemstone ์˜ˆ์‹œ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€

Space

  • ์ถœ์ฒ˜ : Kaggle, Cosmos Images
  • ๋ฐ์ดํ„ฐ ์ˆ˜ : 4649
  • batch ์‚ฌ์ด์ฆˆ : 4
  • ํ•™์Šต ์‹œํ‚จ epoch ์ˆ˜ : 40
  • ์˜ˆ์‹œ ๊ฒฐ๊ณผ space ์˜ˆ์‹œ ๊ฒฐ๊ณผ ์ด๋ฏธ์ง€