
Scientific Data, Journal Year: 2025, Volume and Issue: 12(1)
Published: March 12, 2025
Rapidly acquiring three-dimensional (3D) building data, including geometric attributes like rooftop, height and orientations, as well indicative function, quality, age, is essential for accurate urban analysis, simulations, policy updates. Current datasets suffer from incomplete coverage of multi-attributes. This paper presents the first national-scale Multi-Attribute Building dataset (CMAB) with artificial intelligence, covering 3,667 spatial cities, 31 million buildings, 23.6 billion m² rooftops an F1-Score 89.93% in OCRNet-based extraction, totaling 363 m³ stock. We trained bootstrap aggregated XGBoost models city administrative classifications, incorporating morphology, location, function features. Using multi-source billions remote sensing images 60 street view (SVIs), we generated height, structure, style, quality each machine learning large multimodal models. Accuracy was validated through model benchmarks, existing similar products, manual SVI validation, mostly above 80%. Our results are crucial global SDGs planning.
Language: Английский