16-824 Visual Learning and Recognition: Homework 3 · Spring 2023 · GITHUB


About The Project

Implemented and trained different components of a Transformer decoder for image captioning using a subset of the COCO dataset. Additionally, a Vision Transformer (ViT) was implemented for classification on CIFAR10.


Built With

  • Python
  • NumPy
  • Pytorch

Results

For the entire report, please refer to the Documentation

image image image image image