← Back to projects

Multimodal Medical Image Classification using CLIP and ResNet

One-line summary: Explored multimodal and image-only approaches for diabetic retinopathy classification, comparing semantic alignment against fine-grained visual discrimination.

Key Results

What I Built

Technical Approach

Key Insight

CLIP improved semantic structure in the embedding space, but image-only models performed better when localized visual detail was critical.

Tools / Models Used

Python, PyTorch, CLIP, ResNet-50, Grad-CAM, t-SNE.

Optional links to paper / GitHub / PDF