In this paper, we propose an effective video prediction-based data synthesis method to scale up training sets in order to improve the accuracy of semantic segmentation networks. We also introduce a joint propagation strategy to alleviate mis-alignments in synthesized samples. Furthermore, we present a novel boundary relaxation technique to mitigate label noise. The label relaxation strategy can also be used for human annotated labels and not just synthesized labels. We achieve state-of-the-art performance on three benchmark datasets Cityscapes, CamVid and KITTI. A summarization video demo can be watched below.