Computer Vision | Jiyoon Park

Transformers

16-824 Visual Learning and Recognition: Homework 3 · Spring 2023 · GITHUB About The Project Implemented and trained different components of a Transformer decoder for image captioning using a subset of the COCO dataset. Additionally, a Vision Transformer (ViT) was implemented for classification on CIFAR10. Built With Python NumPy Pytorch Results For the entire report, please refer to the Documentation

Generative Modeling

3D Reconstruction 16-824 Visual Learning and Recognition: Homework 2 · Spring 2023 GITHUB About The Project Implemented and trained Generative Adversarial Networks (GAN) on the CUB 2011 Dataset. The primary goal was to implement GANs following provided instructions, with an emphasis on achieving specific Final FID (Fréchet Inception Distance) scores for different GAN variants: Vanilla GAN, LS-GAN, and WGAN-GP. Built With Python NumPy Pytorch Results Vanilla GAN: Final FID 71.95487635262793...