Bahcesehir UniversityMar 19, 2025

Toward Automatic Streetside Building Identification With an Integrated YOLO Model for Building Detection and a Vision Transformer for Identification

AI Summary

This paper introduces an automated Streetside Building Identification System (SBIS) that combines YOLO for building detection with a Vision Transformer (ViT) for identification, using transfer learning to overcome data limitations. The system leverages Google Street View images to pinpoint building coordinates in urban environments, offering scalability and inclusive representation of diverse architectural styles. Experimental results on a private dataset demonstrate a 94.23% accuracy, suggesting a strong foundation for automated building recognition database creation.

Key Contribution

Achieve 94% accuracy in streetside building identification by fusing YOLO and Vision Transformer models, automating a traditionally laborious data collection process.

Abstract

In an era of widespread digital imagery and advancing machine vision technologies, automated methods to precisely locate photographed buildings are crucial across various sectors. This research initiates the development of an automated Streetside Building Identification System (SBIS). Leveraging the comprehensive coverage of Google Street View images across major cities worldwide, the research integrates a YOLO model for building detection with a Vision Transformer (ViT) model for building identification, supported by Transfer Learning. This innovative approach aims to pinpoint exact building coordinates in urban environments, overcoming challenges associated with insufficient supporting data. Utilizing Google Street View datasets that cover entire urban landscapes, the proposed method offers efficiency and scalability, simplifying data acquisition and avoiding logistical complexities of manual interaction for targeted collections. Furthermore, it ensures a more inclusive representation of diverse urban environments, recognizing buildings of every shape and architectural style. The system can scan areas covered by Google Street View even without commercial or business data, navigating through limited information scenarios and marking a significant progression from previous studies. The initial system version provides insights into its implementation and discusses potential improvements. Tests against privately collected images show a current accuracy of 94.23%, offering a promising foundation for further refinements. The primary objective is to develop an automated solution capable of creating a comprehensive database for building recognition tasks, eliminating the laborious manual search process for extensive datasets.

Architecture Design (Transformers, SSMs, MoE)Computer Vision Multimodal Models

Citation Metrics

Citations2

Influential citations1

References27

Year2025

VenueIEEE Access

Related Papers

Finding related papers...

Search

Toward Automatic Streetside Building Identification With an Integrated YOLO Model for Building Detection and a Vision Transformer for Identification

Related Papers