From ILMS-Wiki

(Difference between revisions)

Jump to: navigation, search

Revision as of 12:05, 20 September 2012

Overview • Project Settings • Cell Creation • Feature Calculation • Classification • Miscellaneous

Classification

The process of thematic classification within ILMSImage is composed of three sub-tasks

An unsupervised classification of the existing cells and their features as cell type classes Self Organization.
The definition of reference or training areas using vector data layers Class References
The actual thematic classification which is based on results of both preceding tasks Object Classification.

Below the tree tasks are described step by step.

An overview of the whole process chain is given on the ILMSimage Tutorial

Self Organization (Clustering)

The unsupervised classification refers to the cell geometry generated during the cell creation and the selected cell features to derive cell type classes by an autonomous, self organizing process. There are groups of cells which - no matter which geographical location in the image or location to one another - have similar features. In this task e.g. elongated and dark cells are separated from those which are rather round and bright - only that the corresponding decision in reality is not limited to the named features but is based on all features generated during attribute calculation. The basic concept of this task therefore corresponds to the partitional cluster analysis (Partitional clustering).

The panel Cluster provides four input parameters and one option to control clustering:

Parameter

Perform Clustering For

References Only
Whole Image

The image statistics for the self-organization can be drawn from the entire image or only from references. Statistics from references can be calculated much more quickly due to their smaller areas.

In order to do that, references have to be defined. At this phase of the project, "references" need not to be real references. A few rectangles, covering all relevant image features and structures with only one class can be used as "references" to speed up calculation and exchanged later for the real references that will define image objects.

Algorithm of Self-Organization

unchanged
Codebook (k-Means)
Kohonen Neuronal Network
Decision Tree
Support vector Machine

The options provide simple and more sophisticated techniques for clustering the image features but in almost any case the Codebook (k-means) algorithm works best. The different options are initialized by Maximum Variability Within Clusters in a way to produce similar or at least comparable results. Further information is given below in the chapter Background.

(Optimize Feature Selection)

inactivated

Maximum Variability Within Clusters

controls the permissible generalization within one cluster. The accepted input values range from 0 = "only identical feature combinations" to 1 = "accept anything". For image data with spectral bands in the visual range an input value from 0.1 to 0.2 is a good starting point. The smaller the value used as input, the more cluster are created. In the first step of the classification, as few clusters as possible should be created.

Show Cluster Image

optionally creates a thematic image that shows the results of the clustering process and loads it to the QuantumGIS canvas.

Accomplish Self Organization

After clicking the [Run] button the clustering process will start and a small window with a progress bar will show up and provide a message about the process status. According to the number of cells an the selected options the calculation time can vary greatly.

If the Show Cluster Image option is active, QuantumGIS will load a color coded raster layer to show the results of the cluster process. After successful cluster calculation ILMSimage will show the Reference panel. Using the above settings the result should show like this. The colors are random and may differ from the example.

All messages of the current run are listed in the Control panel and stored in a logfile for further use.

Background

The Codebook or k-Means algorithm represents a frequently used method for finding structures in large amounts of data. The number of clusters which should be found is determined beforehand. After a random initialization every cell is assigned to the cluster which the most similar cluster center. If this assignment is completed, the cluster centers are recalculated and every cell is compared to existing clusters again. These steps are repeated until none of the assignments change again - the cluster analysis is completed.

Another method for partitional clustering is the algorithm of self-organizing maps which was originally developed by Teuvo Kohonen. It is an artificial neuronal network which is able to project a multi-dimensional feature space functioning as a data source on a two-dimensional range of values. This characteristic explains the relation to the map.

The cells created by ILMSImage and their derived features also represent a multi-dimensional feature space. The parameter variability (range of values from 0 to 1) controls the "sensitivity" of the processes. A higher value corresponds to a higher acceptable variability of features within the clusters which are to be generated - hence their number decreases. A low value of variability generates a high number of classes since the acceptable variability of features within those is lower.

Object Class References

Every classification needs class definitions. Object class definitions tend to be rather complex, so for that reason ILMSimage uses typical examples or reference areas to define object classes. A polygon shape with a few vertices covering a typical example of the desired class in the image is sufficient to set up a suitable class definition and classify other appearances of the defined class. The reference polygons can be drawn on demand or existing polygons may be used after modification. Point shapes can be used to mark single cells as reference area.

Every polygon or point shape with a field called Class in integer format (natural numbers) is fit to define ILMSimage object classes. Line shapes are not accepted. Other fields are optional and not restricted in any way. Classes are identified by an exclusive number in the field Class. The numbers have to be consecutive and start from one. Equal numbers define equal classes, even if polygon and point shapes are mixed. All cells which are covered by 2/3 are counted as reference. A point shape marks single cells as reference area.

ILMSimage is designed to classify complex image objects, so the classification rules tend to be rather complicated. For this reason, ILMSimage uses examples i.e. reference areas to define object classes. Object classes differ considerably from well-known class definitions that concentrate on pixel-related properties. Object class defintion are based on cells. ILMSimage combines various cell types (cluster) in an specific frequency and a specific spatial order (proximity) to define an object class. This means, that the reference areas have to be big enough to allow statistical analysis of the mentioned parameters. To define the class "urban industial area" it is recommended to draw a polygon around the whole extent of a typical industrial area. Only in rare occasions, an object class will be represented by a single cell type or cluster. Waterbodies may be the most important exception. For this reason, point shapes can mark single cells as class references.

Modify Existing Shape Layers

ILMSimage accepts each polygon or point shape as a reference if a field Class is present. Adding a field "Topic" in integer format to an existing polygon shape an filling the field with an exclusive number for each class makes it a suitable ILMSimage reference. Optonally a field "Size" and a comment field may be added.

Create Reference Polygon Shapes

create a new shapefile

To create a new reference shapefile simply use the tools provided with QuantumGIS. Immediatly after file creation, QuantumGIS will ask about associated attribute names. Only the attribute "Class" with "whole numbers" is mandatory, everything else can be added later.

define attributes

choose the new vector layer type polygon
in the "New attribute" box print "Class" next to the input field "Name"
Choose "Whole number" as "Type"
input "10" for "Width" (any number between 2 and 10 will be sufficient)
press the button "add to attribute list" and the new attribute will show up in the "Attributes list" below.

Only the attribute Class is mandatory for classification!

Optional attribute Size: If You intend to use the full capabilities of ILMSimage, the optional attribute Size in float format (Decimal number) can be used to control minimum object size in classification for each class separately object classification.

Optional attribute Hint: An optional attribute Hint in string format (Text data) may be convenient (the name Hint is only a suggestion). There is no restricton to add further attributes.

With the [OK] button QuantumGIS will create a new and empty polygon shape layer with the obove defined attributes and show it on top of layers selection box.

prepare digitizing

Select the new generated shape in the layers selection box and choose [ Layer | Toggle Editing ] in the main menue bar to start edit mode (above)
Choose [ Edit | CapurePolygon ] in the main menue bar to start digitizing a new polygon (below)

digitize reference areas

Digitize the new polygon by single clicks for all desirered vercices
if the polygon is finished, a right-click will show the attribute entry form (below)

enter field values

QuantumGIS shows a form to enter values for all attributes in the polygon definition. The field "Class" must be filld with natural numbers (integer). At the end, they must be consecutive and start at one.
[OK] completes the polygon definition

define class colors

Individual class colors may be useful to differentiate between the newly created reference areas. To assign inividual colors to each class, double click the reference layer in the layer selection box and an entry form will show up.

Choose the panel Style
Choose new symbology. ILMSimage will not copy colors from the old symbology settings
A double click on the coloured box in Symbols opens a color selection menue.
Click on Change and
assigns an appropriate color to the new polygon

Create Reference Point Shapes

Creating reference point shapes is almost identical to polygon shapes. While polygon shapes cover a couple of cells, point shapes are coneccted to one individual cell. The cell borders shoud be coarsly known before new point shape are set. If the cell index is recalculated, point shapes may loose a meaningful position.

Object Classification

The "real" classification in ILMSimage is the Object classification step. If Cluster and References are set up properly, the work is almost done. Two parameters Allow Exceptions in References and Minimum Object Size can modify the result.

Classification Parameters

Minimum Object Size

Minimum Object Size allows to exclude small cell combinations from beeing classified as objects if desired. In this context "small" means "consisting of few cells". The edit box accepts input values from 0 to 0.99.

ILMSimage is designed to classify complex image objects. In some cases very small potential opbject meet the conditions for beeng classified as objects and therefore a few cells representing trees may be classified as "forest". To prevent small objects, ILMSimage calculates an object size by measuring how many borders connect cells to other cells of the same object and how many borders connect the object to other classes. The division "internal border length" / "total border lenth" is defined as object size. The definition is related to the complexity of image objects and does not define any absolute size. Nine squares arranged as 3x3 matrix or sixteen squares arraned as 2x6 matrix show an object size of 0.5. An object size of 1.0 would be infinite and is thus impossible.

Object size can be entered in the panel Classes. This associates the entry to all defined classes. To get more specific definitions, the object size can be entered as well during the reference definition. ILMSimage searches for a field Size in the reference shape attribute table. If such a field is found, it overwrites the entry in the panel Classes for this specific class. It is possible to add the field Size to an existing reference shape at any time desired.

Alow Exceptions in References

Complex structured reference areas commonly contain a few individual cells which do not fit to the remainding class definition. If a number above zero is given in the entry, ILMSimage deletes most inapplicable cells from the reference definition until the proportion of deleted cells sums up to the entry given. The edit box will accept input values from 0 to 0.99.

Show classification Image

ILMSimage can generate a raster image with the results of the actual classification. QuantumGIS shows the layer on the canvas. A result shape layer can be accessed from Export Classification as Shape Layer in the Export Panel. A raster layer will show up much quicker than a vector layer, so during evaluation of reference sites and classification parameters, a raster layer can be preferrable.