IMLSimage 2.4 Classification
The process of thematic classification within ILMSImage is composed of three sub-tasks
- An unsupervised classification of the existing cells and their features as cell type classes Self Organization.
- The definition of reference or training areas using vector data layers Class References
- The actual thematic classification which is based on results of both preceding tasks Object Classification.
Below the tree tasks are described step by step.
An overview of the whole process chain is given on the ILMSimage Tutorial
Self Organization (Clustering)
The unsupervised classification refers to the cell geometry generated during the cell creation and the selected cell features to derive cell type classes by an autonomous, self organizing process. There are groups of cells which - no matter which geographical location in the image or location to one another - have similar features. In this task e.g. elongated and dark cells are separated from those which are rather round and bright - only that the corresponding decision in reality is not limited to the named features but is based on all features generated during attribute calculation. The basic concept of this task therefore corresponds to the partitional cluster analysis (Partitional clustering).
The panel Cluster provides four input parameters and one option to control clustering:
Perform Clustering For
- References Only
- Whole Image
The image statistics for the self-organization can be drawn from the entire image or only from references. Statistics from references can be calculated much more quickly due to their smaller areas.
In order to do that, references have to be defined. At this phase of the project, "references" need not to be real references. A few rectangles, covering all relevant image features and structures with only one class can be used as "references" to speed up calculation and exchanged later for the real references that will define image objects.
Algorithm of Self-Organization
- Codebook (k-Means)
- Kohonen Neuronal Network
- Decision Tree
- Support vector Machine
The options provide simple and more sophisticated techniques for clustering the image features but in almost any case the Codebook (k-means) algorithm works best. The different options are initialized by Maximum Variability Within Clusters in a way to produce similar or at least comparable results. Further information is given below in the chapter Background.
(Optimize Feature Selection)
Maximum Variability Within Clusters
controls the permissible generalization within one cluster. The accepted input values range from 0 = "only identical feature combinations" to 1 = "accept anything". For image data with spectral bands in the visual range an input value from 0.1 to 0.2 is a good starting point. The smaller the value used as input, the more cluster are created. In the first step of the classification, as few clusters as possible should be created.
Show Cluster Image
optionally creates a thematic image that shows the results of the clustering process and loads it to the QuantumGIS canvas.
Accomplish Self Organization
After clicking the [Run] button the clustering process will start and a small window with a progress bar will show up and provide a message about the process status. According to the number of cells an the selected options the calculation time can vary greatly.
If the Show Cluster Image option is active, QuantumGIS will load a color coded raster layer to show the results of the cluster process. After successful cluster calculation ILMSimage will show the Reference panel. Using the above settings the result should show like this. The colors are random and may differ from the example.
All messages of the current run are listed in the Control panel and stored in a logfile for further use.
The Codebook or k-Means algorithm represents a frequently used method for finding structures in large amounts of data. The number of clusters which should be found is determined beforehand. After a random initialization every cell is assigned to the cluster which the most similar cluster center. If this assignment is completed, the cluster centers are recalculated and every cell is compared to existing clusters again. These steps are repeated until none of the assignments change again - the cluster analysis is completed.
Another method for partitional clustering is the algorithm of self-organizing maps which was originally developed by Teuvo Kohonen. It is an artificial neuronal network which is able to project a multi-dimensional feature space functioning as a data source on a two-dimensional range of values. This characteristic explains the relation to the map.
The cells created by ILMSImage and their derived features also represent a multi-dimensional feature space. The parameter variability (range of values from 0 to 1) controls the "sensitivity" of the processes. A higher value corresponds to a higher acceptable variability of features within the clusters which are to be generated - hence their number decreases. A low value of variability generates a high number of classes since the acceptable variability of features within those is lower.
Object Class References
Every classification needs class definitions. Object class definitions tend to be rather complex, so for that reason ILMSimage uses typical examples or reference areas to define object classes. A polygon shape with a few vertices covering a typical example of the desired class in the image is sufficient to set up a suitable class definition and classify other appearances of the defined class. The reference polygons can be drawn on demand or existing polygons may be used after modification. Point shapes can be used to mark single cells as reference area.
Every polygon or point shape with a field called Class in integer format (natural numbers) is fit to define ILMSimage object classes. Line shapes are not accepted. Other fields are optional and not restricted in any way. Classes are identified by an exclusive number in the field Class. The numbers have to be consecutive and start from one. Equal numbers define equal classes, even if polygon and point shapes are mixed. All cells which are covered by 2/3 are counted as reference. A point shape marks single cells as reference area.
ILMSimage is designed to classify complex image objects, so the classification rules tend to be rather complicated. For this reason, ILMSimage uses examples i.e. reference areas to define object classes. Object classes differ considerably from well-known class definitions that concentrate on pixel-related properties. Object class defintion are based on cells. ILMSimage combines various cell types (cluster) in an specific frequency and a specific spatial order (proximity) to define an object class. This means, that the reference areas have to be big enough to allow statistical analysis of the mentioned parameters. To define the class "urban industrial area" it is recommended to draw a polygon around the whole extent of a typical industrial area. Only in rare occasions, an object class will be represented by a single cell type or cluster. Waterbodies may be the most important exception. For this reason, point shapes can mark single cells as class references.
Modify Existing Shape Layers
ILMSimage accepts each polygon or point shape as a reference if a field Class is present. Adding a field "Topic" in integer format to an existing polygon shape and filling the field with an exclusive number for each class makes it a suitable ILMSimage reference. Optionally a field "Size" and a comment field may be added.
Create Reference Polygon Shapes
Creating a new shapefile
To create a new reference shapefile simply use the tools provided with QuantumGIS. Immediately after file creation, QuantumGIS will ask about associated attribute names. Only the attribute "Class" with "whole numbers" is mandatory, everything else can be added later.
- Choose the new vector layer type polygon
- In the "New attribute" box print "Class" next to the input field "Name"
- Choose "Whole number" as "Type"
- Input "10" for "Width" (any number between 2 and 10 will be sufficient)
- Press the button "add to attribute list" and the new attribute will show up in the "Attributes list" below.
Only the attribute Class is mandatory for classification!
Optional attribute Size: If You intend to use the full capabilities of ILMSimage, the optional attribute Size in float format (Decimal number) can be used to control minimum object size in classification for each class separately object classification.
Optional attribute Hint: An optional attribute Hint in string format (Text data) may be convenient (the name Hint is only a suggestion). There is no restriction to add further attributes.
With the [OK] button QuantumGIS will create a new and empty polygon shape layer with the above defined attributes and show it on top of layers selection box.
- Select the new generated shape in the layers selection box and choose [ Layer | Toggle Editing ] in the main menu bar to start edit mode (above)
- Choose [ Edit | CapurePolygon ] in the main menu bar to start digitizing a new polygon (below)
Digitizing reference areas
- Digitize the new polygon by single clicks for all desired vertices
- if the polygon is finished, a right-click will show the attribute entry form (below)
Entering field values
- QuantumGIS shows a form to enter values for all attributes in the polygon definition. The field "Class" must be filled with natural numbers (integer). At the end, they must be consecutive and start at one.
- [OK] completes the polygon definition
Defining class colors
Individual class colors may be useful to differentiate between the newly created reference areas. To assign inividual colors to each class, double click the reference layer in the layer selection box and an entry form will show up.
- Choose the panel Style
- Choose new symbology. ILMSimage will not copy colors from the old symbology settings
- A double click on the coloured box in Symbols opens a color selection menu.
- Click on Change and
- assigns an appropriate color to the new polygon
Creating Reference Point Shapes
Creating reference point shapes is almost identical to polygon shapes. While polygon shapes cover a few cells, point shapes are connected to one individual cell. The cell borders should be coarsely known before new point shape are set. If the cell index is recalculated, point shapes may loose a meaningful position.
The "real" classification in ILMSimage is the Object classification step. If Cluster and References are set up properly, the work is almost done. Two parameters Allow Exceptions in References and Minimum Object Size can modify the result.
Minimum Object Size
Minimum Object Size allows to exclude small cell combinations from being classified as objects if desired. In this context "small" means "consisting of few cells". The edit box accepts input values from 0 to 0.99.
ILMSimage is designed to classify complex image objects. In some cases very small potential object meet the conditions for beeng classified as objects and therefore a few cells representing trees may be classified as "forest". To prevent small objects, ILMSimage calculates an object size by measuring how many borders connect cells to other cells of the same object and how many borders connect the object to other classes. The division "internal border length" / "total border length" is defined as object size. The definition is related to the complexity of image objects and does not define any absolute size. Nine squares arranged as 3x3 matrix or sixteen squares arranged as 2x6 matrix show an object size of 0.5. An object size of 1.0 would be infinite and is thus impossible.
Object size can be entered in the panel Classes. This associates the entry to all defined classes. To get more specific definitions, the object size can be entered as well during the reference definition. ILMSimage searches for a field Size in the reference shape attribute table. If such a field is found, it overwrites the entry in the panel Classes for this specific class. It is possible to add the field Size to an existing reference shape at any time desired.
Allowing Exceptions in References
Complex structured reference areas commonly contain a few individual cells which do not fit to the remaining class definition. If a number above zero is given in the entry, ILMSimage deletes most inapplicable cells from the reference definition until the proportion of deleted cells sums up to the entry given. The edit box will accept input values from 0 to 0.99.
Showing classification Image
ILMSimage can generate a raster image with the results of the actual classification. QuantumGIS shows the layer on the canvas. A result shape layer can be accessed from Export Classification as Shape Layer in the Export Panel. A raster layer will show up much quicker than a vector layer, so during evaluation of reference sites and classification parameters, a raster layer can be preferrable.