Creation of training data set for Sentinel-2 land cover classification

Gromny E., Lewiński S., Rybicki M., Krupiński M., Malinowski R., Nowakowski A., Jenerowicz G.

Space Research Centre, Polish Academy of Sciences

Supervised classification of satellite images is performed based on training points. Quality and availability of reference data determines the results and the course of the entire classification process.
In ESA S2GLC (Sentinel-2 Global Land Cover) project Sentinel-2 images are classified using Random Forest algorithm powered by training points selected from existing low resolution land cover databases. This assumption allows to perform the classification process in a completely automatic manner without any need of operator’s intervention. An alternative method for creating training data set has been developed in order to ensure the implementation of S2GLC classification in case of limited access to the relevant existing databases or their lower quality.
The proposed method is a semi-automatic process initiated by the operator who, in a traditional way of visual interpretation, sets only several starting points for each class. Afterwards, based on those few initial marks, hundreds or thousands of points with similar spectral characteristics are automatically selected on the image. Such a set of data can be used as an alternative source of training points for land cover classification.
Comparing to the traditional approach, in which all points or training areas are manually indicated, developed method is very effective and it also allows for processing the data more rapidly. Semi-automatic training can be used as an alternative or supplement to training applied in the S2GLC classification approach.

Author: Ewa Gromny
Conference: Title