Compare simple CNN and ResNet for building extraction with tensorflow. Out of date, but interesting trial.
Massachusetts building dataset
Each big tile (e.g. 15001500) is overlap-cropped to 6464 patches with stride 16. The center 16*16 sub-patch is the label of the patch.
At this time, no fully convolutional networks were used. The model (CNN or ResNet) has a final fully connected layer with output shape (B,256,2). In other words, the center sub-patch is flattened and each pixel is classified.
Interesting sawtooth effect. From left to right: RGB image, groud truth, prediction.
note: Not yet fully convolutional network based sementic segmentation, out of date, but interesting trial.