'DeepLearning' 카테고리의 글 목록

Basel Face Model (BFM)

Basel Face Model (BFM)https://faces.dmi.unibas.ch/bfm/index.php?nav=1-1-0&id=details MorphaceThis page is part of the old Basel Face Model from 2009, find most recent Basel Face Model here. Basel Face Model - Details Details of the Basel Face Model The geometry of the BFM consists of 53,490 3D vertices connected by 160,470 triangles. Faces of difffaces.dmi.unibas.chPCA를 기반으로 3D 얼굴 데이터를 압축하여 정체성,..

DeepLearning/Computer Vision 2024.12.26

[ECCV 2024 Review] Multi-Class Anomaly Detection in ECCV'24

오랜만의 포스팅으로는 24년도 eccv에 발표된 multi-class anomaly detection 논문에 대해 정리해보려고 합니다. 모든 페이퍼를 다 다루기에는 이제 anomaly detection도 꽤나 많이 나오네요!!Stats일단 간략하게 이번 eccv에 발표된 ad 논문의 stat을 요약해보았습니다. 총 14개 paper가 억셉이 되었습니다. 해를 거듭할 수록 CV 학회에서 꽤나 많은 anomaly detection 논문이 나오고 있습니다. 22년도 cvpr에 나온 patchcore 논문이 시발점이 되지 않았나 싶습니다 (벌써 인용 수가 800회가 넘었습니다..)다들 아시겠지만, MVTec-AD 벤치마크는 이미 많이 정복된 상태이며 22년도 ECCV에 나온 VisA 역시 그렇습니다. SoTA ..

DeepLearning/Computer Vision 2024.10.31

[Pytorch] torch.no_grad() versus requires_grad=False

prompt learning 코드 보는데 갑자기 이해가 안가는 점이 생김. 궁금증은 아래 이슈에서부터 시작됨. https://github.com/KaiyangZhou/CoOp/issues/7 question about gradients on text encoder · Issue #7 · KaiyangZhou/CoOp Hi, may I ask if the gradients of the original CLIP text encoder are frozen or not? The paper mentioned that the gradients of text encoder is frozen, but I couldn't find that part in the code... Th... github.com 여기 보면, C..

DeepLearning/Pytorch 2023.11.13

[Paper Review] BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation (ICML'22)

오늘 리뷰할 논문은 22년도 ICML에서 발표된 BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation 입니다. 기존의 VLP 연구의 한계를 모델, 데이터 두 가지 관점에서 설명하며 이를 보완한 unified VLP framework를 제안합니다. 실험 결과가 매우 강력했지만, dataset bootstrapping으로 얻은 효과 이외에도 구조 자체의 이점 또한 있었던 것 같습니다. Training cost (time)에 대한 이야기가 있었다면 bootstrapping에 대한 정당성이 조금 더 높아지지 않았을까 합니다. 후속 연구인 BLIP2에서는 end-to-end pre..

DeepLearning/Multi-Modal 2023.05.03

[Paper Review] Masked Autoencoders Are Scalable Vision Learners (CVPR'22)

오늘 리뷰할 논문은 Masked Autoencoders Are Scalable Vision Learners (a.k.a. MAE)입니다. 22년도 CVPR에서 oral presentation으로 선정된 논문이며, Masked Modeling을 아주 간단한 방식을 통해서 Vision 분야의 self-supervised pre-training에 성공적으로 적용한 논문입니다. 이 방법론은 Simple, effective, scalable하다고 정리할 수 있겠습니다! 하기할 내용에 오류 혹은 질문이 있을 경우 언제든 댓글 부탁드립니다! Abstract We mask random patches of the input image and reconstruct the missing pixels based on two ..

DeepLearning/Computer Vision 2023.04.29

[Paper Review] DINO: Emerging Properties in Self-Supervised Vision Transformers (ICCV'21)

Emerging Properties in Self-Supervised Vision Transformers 이번에 리뷰할 논문은 2021년 ICCV에서 발표된 Emerging Properties in Self-Supervised Vision Transformers (Venue: Facebook AI Research)입니다. DINO라는 self-distillation 구조의 자기지도학습 방법론을 제안하며, 동시에 self-supervised learning과 ViT가 결합되며 발생하는 특성들에 대한 분석과 흥미로운 실험 결과를 논문에서 밝히고 있습니다. 특히 self-supervised ViT가 segmentation mask에 대한 정보를 갖고 있다는 특성이 굉장히 재미있었고, 기존의 supervised..

DeepLearning/Computer Vision 2023.04.21

[Paper Review] UNIFIED-IO: A Unified Model For Vision, Language, And Multi-Modal Tasks (ICLR'23)

오늘 리뷰할 논문은 ICLR'23에 notable top 25%로 선정된 Unified-IO: A Unified Model For Vision, Language, And Multi-Modal Tasks 라는 논문입니다. 논문에서는 하나의 모델로 기존의 연구에서 다루던 task보다 많은 range의 task를 다루는 unified architecture를 제안합니다. 아이디어는 간단합니다. Encoder-decoder 구조를 통해 architecture에 있어서 unification을 이루면서도 다양한 input, output을 generate할 수 있게 모두 discrete tokenization을 통해서 architecture에 feed하겠다는 것입니다. 대략 30억개의 파라미터를 갖는 XL 모델을 p..

DeepLearning/Multi-Modal 2023.04.11

[DL] Webly-supervised learning

https://blog.salesforceairesearch.com/mopro-webly-supervised-learning-with-momentum-prototypes/ MoPro: Webly Supervised Learning with Momentum Prototypes > TL; DR: We propose a new webly-supervised learning method which achieves state-of-the-art representation learning performance by training on large amounts of freely available noisy web images. Deep neural networks are known to be hungry for l..

DeepLearning/Basic 2023.04.05

[Paper Review] MetaFormer is Actually What You Need for Vision (CVPR'22)

오늘 리뷰할 논문은 CVPR 2022 oral로 선정된 페이퍼인 'MetaFormer is Actually What You Need for Vision'이라는 논문입니다. Vision task를 위해 실질적으로 필요한 것은 well-designed token mixer가 아닌 metaformer라는 transformer-like models가 공유하고 있는 추상화된 구조라는 주장을 하고 있습니다. 이를 위해 poolformer라는 아주 간단한 pooling operation을 통해 token mixing을 함으로써 실질적인 성능 기여는 well designed token mixer가 아닌 metaformer 구조 자체임을 보여주고 있습니다. 기존의 연구 방향이 어떤 token mixer를 사용해야 하고,..

DeepLearning/Computer Vision 2023.04.01

[Paper Review] Patch-level Representation Learning for Self-supervised Vision Transformers (CVPR'22)

오늘은 CVPR 2022에서 oral presentation으로 선정된 논문인 Patch-level Representation Learning for Self-supervised Vision Transformers (a.k.a. SelfPatch) 를 리뷰해보도록 하겠습니다. 기존의 SSL ViT 아키텍처가 모두 global representation만을 loss에서 활용된다는 점을 이야기하면서 이러한 부분은 attention의 collapse로 이어지며 representation quality를 떨어뜨린다고 문제를 제기하고 있습니다. 해당 논문은 ViT 아키텍처에서 손쉽게 patch representation을 얻을 수 있음에도 불구하고 이러한 부분은 전혀 활용되지 않고 있다는 점에서 출발하여, ViT..

DeepLearning/Computer Vision 2023.01.26

유진's 공부로그

DeepLearning 50

티스토리툴바