Back to Projects

Bird's-Eye View Transformation System

Bachelor's thesis: developing a robust BEV dataset for autonomous driving by combining drone footage, object detection, and semantic segmentation.

August 1, 2023
Computer VisionTensorFlowObject DetectionDeep Learning

Overview

My bachelor's thesis at Universiteit van Amsterdam, completed for Saivvy — a company building a phone app that provides cyclists a bird's-eye view of their surroundings by transforming ground-view camera footage into a top-down perspective. The project focused on creating a robust BEV dataset using drone and side-view cameras, then training detection and segmentation models on this real-world data.

Drone Bird's-Eye View Footage at 60m

Data Collection

  • Captured drone footage at 60 meters altitude using a DJI Mini 3 Pro with auto-lock tracking
  • Synchronized ground-view cameras with drone footage using NTP server time calibration (10-20ms accuracy)
  • Collected footage across different urban environments: intersections, traffic lights, occluded roads
  • 70/20/10 train/validation/test split with geographically separated test locations

Models and Results

Three segmentation architectures were evaluated on 7 semantic classes (road, bike lane, sidewalk, pedestrian crossing, continuous/non-continuous lines):

  • Segformer (Mit-B1): Best overall performance — 59.6% recall, 41.8% precision, 30.5% mIoU
  • FCN (ResNet): Strong on pedestrian crossings (94.4% recall) but lower overall precision
  • PointRend (ResNet): Competitive on sidewalks and continuous lines

For detection, models were tested with pre-training on the Stanford Drone Dataset. A tracking script maintained bounding box identity across frames, feeding into a post-processing pipeline for Saivvy's mapping model.

Segmentation Annotation in CVAT — road, sidewalk, bike lane classes

Technologies

Python, OpenMMLab, Segformer, FCN, PointRend, CVAT, DJI Mini 3 Pro