Set Evaluation criteria

From ILMS-Wiki
(Difference between revisions)
Jump to: navigation, search
(Set evaluation criteria with the aid of JAMS Model Builder)
(Bestimmtheitsmaß)
 
(9 intermediate revisions by one user not shown)
Line 1: Line 1:
 
It is necessary to have evaluation criteria, to judge the efficiency of an simulation. Evaluation Criteria quantify the similarity of two timelines.
 
It is necessary to have evaluation criteria, to judge the efficiency of an simulation. Evaluation Criteria quantify the similarity of two timelines.
Um die Güte einer einzelnen Simulation zu beurteilen, werden meist Bewertungskriterien verwendet.Bewertungskriterien quantifizieren die Ähnlichkeit zweier Zeitreihen. Somit sind sie einerseits ein objektives Maß für die Abweichung einer simulierten Zeitreihe von der Beobachtung und andererseits ermöglichen
+
Evaluationcriteria are used to check the quality of an simulation. Although they quantify the simularity of two timelines. That makes it an objective measure instrument for variance of a timeline when observated and on the other hand enable to make an evaluated comparisson of errorfunctions between multiple timelines with the focus on observation. It is an goal of automated calibriation to change modellparameters in that way, that the simularity of an simulated timeline to an observated timeline, get maximized. For this process are errorfunctions essential.
Fehlerfunktionen den bewertenden Vergleich mehrere Zeitreihen hinsichtlich einer Beobachtung. Da es ein Ziel der automatischen Kalibrierung ist, die Modellparameter so anzupassen, dass die Ähnlichkeit einer simulierten Zeitreihe zu einer beobachteten Zeitreihe maximiert wird, sind Fehlerfunktionen
+
There exists countless measures, to evaluate the modell performance, that execl through there individual pro and contras. Usually used measures are: 
für diese unentbehrlich.
+
  
Es existieren zahlreiche Maße, um das Modellverhalten zu bewerten, die sich durch individuelle Vor- und Nachteile auszeichnen. Häufig verwendete Maße sind zum Beispiel:
 
  
 
==prozentualer Volumenfehler (PBIAS)==
 
==prozentualer Volumenfehler (PBIAS)==
misst den relativen Fehler der simulierten gegenüber der beobachteten Zeitreihe. Dieses Maß zeigt, ob die Simulation einer systematischen Unter- oder Überschätzung unterliegt.
+
Measures the relative error between the simulated and the observated timeline. This measure shows, if the simulation follows a systematically over or unter rating.
 
+
  
 
<math>  {V_{rel}(\theta)} = { {\sum_{i=1}^r (o_{t_i} - y_{t_i})} \over {\sum_{i=1}^r o_{t_i}}} </math>
 
<math>  {V_{rel}(\theta)} = { {\sum_{i=1}^r (o_{t_i} - y_{t_i})} \over {\sum_{i=1}^r o_{t_i}}} </math>
  
 
==Nash-Sutcliffe-Effizienz(E2)==
 
==Nash-Sutcliffe-Effizienz(E2)==
Die Nash-Sutcliffe Effizienz E2 (Nash und Sutcliffe, 1970)
+
The Nash-Sutcliffe Effizienz E2 (Nash und Sutcliffe, 1970)
TODO:Formel einfügen
+
ist definiert als Eins minus der Summe der absoluten quadratischen Abweichungen, die mit der Varianz
+
der Beobachtungen im betrachteten Zeitraum normalisiert wird. Der Wertebereich der Nash-Sutcliffe
+
Effizienz reicht von minus unendlich bis 1. Sie nimmt ihr Maximum genau dann an, wenn beide Zeitreihen
+
übereinstimmen. Ein negativer Wert bedeutet, dass das Modell die Beobachtung schlechter schätzt als der
+
Mittelwert der Beobachtungen. Obwohl E2 sehr häufig verwendet wird, hat das Maß einige Nachteile: Da
+
die Abweichung quadratisch in die Berechnung eingeht, werden hohe Abflusswerte stärker gewichtet als
+
niedrige Werte, so dass dieses Maß vor allem für die Beurteilung von Hochwasserperioden und Abflussspitzen
+
geeignet ist (Legates und McCabe Jr, 1999). Schaefli und Gupta (2007) stellen fest, dass E2
+
von der Streuung der Zeitreihen abhängt und somit gebietsspezifisch ist, d.h. ein gebietsübergreifender
+
Vergleich der Nash-Sutcliffe Effizienz ist nur bedingt möglich.
+
  
==Die modifizierte Nash-Sutcliffe Effizienz E1==
+
<math> E_2 (\theta) = 1 - { \sum_{i=1}^r (o_{t_i} -y_{t_i})^2 \over \sum\limits^{r}_{t=1} ( o_{t_i}- \overline o )^2 } \quad \mathrm { with} \quad \overline o =  \frac {1}{r} \sum\limits^{r}_{t=1}  o_{t_i} </math>
 +
 
 +
 
 +
is definated as one monus the sum of absolute quadratic aberration, that will be normalized with the variance of observation in the viewed period. The range of the Nash-Sutcliffe efficience reaches from minus till plus one. It has its maxiums if both timelines  are in conform. An negativ value means, that the modell rated the observation less than the middlevalue of observation. E2 is used very often, but has some disadvantages. Aberrances are quadratic calculated, therefore are higher drainvalues stronger judged than lower drainvalues. But as an result it is perfectly used for evaluation of highwaterperiods and drainpikes (Legates and McCabe Jr, 1999). Schaefli and Gupta (2007) made a point, that E2 depends from the statistical spread of the  timelines and therefore is only areaspecificaly. Thats why the Nash-Sutcliffe efficience can only be used restricted to compare different areas.
 +
 
 +
==The modified Nash-Sutcliffe Effizienz E1==
  
 
<math> E_1 (\theta) = 1 - { \sum_{i=1}^r \left |o_{t_i} -y_{t_i} \right | \over \sum\limits^{r}_{t=1} \left | o_{t_i}-\overline o \right | } </math>
 
<math> E_1 (\theta) = 1 - { \sum_{i=1}^r \left |o_{t_i} -y_{t_i} \right | \over \sum\limits^{r}_{t=1} \left | o_{t_i}-\overline o \right | } </math>
  
  
besitzt ähnliche Eigenschaften wie die gewöhnliche Nash-Sutcliffe Effizienz. Im Unterschied zu dieser werden die Abweichungen durch die gewöhnliche Betragsfunktion quantifiziert, so dass eine überproportionale Gewichtung der Abflussspitzen verhindert wird. Daher liefert diese Fehlerfunktion eine ausgewogenere Bewertung der Simulation (Krause et al., 2005).
+
has simulare properties as the usual Nash-Sutcliffe efficience. The difference is that the aberrance is quantified with the usual absolute value function. So can be an overproportional weighting of drainpikes be denied. This error function provides an balanced evaluation for the simulation (Krause et al., 2005).
 +
 
 +
==logarithmic Nash-Sutcliffe-Effizienz (logE2)==
 +
 
 +
<math> logE_2 (\theta) = 1 - { \sum_{i=1}^r \left |\ln o_{t_i} -\ln y_{t_i} \right |^2 \over \sum\limits^{r}_{t=1} \left |\ln o_{t_i}-\overline {\ln o} \right |^2 } \quad \mathrm { with} \quad \overline {\ln o} =  \frac {1} {r} \sum\limits^{r}_{t=1} \ln o_{t_i} </math>
  
==logarithmierte Nash-Sutcliffe-Effizienz (logE2)==
+
is used to make an logarithmic transformation of the values. So that higher values (e.g. drainpikes) are more flatened and lower values have more weight in the simulation. (Krause and Flügel, 2005).
  
<math> logE_2 (\theta) = 1 - { \sum_{i=1}^r \left |\ln o_{t_i} -\ln y_{t_i} \right |^2 \over \sum\limits^{r}_{t=1} \left |\ln o_{t_i}-\overline {\ln o} \right |^2 } \quad \mathrm { mit} \quad \overline {\ln o} = \frac {1} {r} \sum\limits^{r}_{t=1} \ln o_{t_i} </math>
+
==Determinationvalue==
  
nimmt eine logarithmische Transformation der Werte vor, so dass hohe Werte (z. B. Abflussspitzen)
+
The determinationvalue R2 measures the intensity of linear connections between observated and simulated timelines. The valuerange of this value is between 0 and 1. There is an perfect linear connection between observated and simulated timelines, if the value is 1 or 0. There is no linear connection if the value of R2 is 0. The determination value doesn't valuate two timelines for there quantity conformity. It only checks there dynamics are alike. It is not recommendet to valuate hydrographs alone with this measure (Krause et al., 2005). The determinationvalue is define as the square of the empiric correalationcoefficient of both timelines:
gegenüber niedrigen Werten stärker abgeflacht werden. Dadurch wird die Gewichtung der niedrigen
+
Abflüsse erhöht (Krause und Flügel, 2005).
+
  
==Bestimmtheitsmaß==
 
Das Bestimmtheitsmaß R2 misst die Stärke des linearen Zusammenhangs zwischen beobachteter und simulierter Zeitreihe. Der Wertebereich dieses Maßes beträgt 0 bis 1. Nimmt dieses Maß den Wert 1,0 an, besteht ein perfekter linearer Zusammenhang zwischen der beobachteten und simulierten Zeitreihe. Im entgegengesetzten Fall (R2 = 0) lässt sich gar kein linearer Zusammenhang feststellen. Das Bestimmtheitsmaß beurteilt also nicht, ob zwei Zeitreihen quantitativ übereinstimmen, sondern nur ob sich ihre Dynamik ähnelt. Somit sollte die Bewertung von Hydrographen nicht allein mit diesem Maß durchgeführt werden (Krause et al., 2005).
 
Das Bestimmtheitsmaß ist definiert als Quadrat des empirischen Korrelationskoeffizienten beider Zeitreihen:
 
  
<math> R^2 = {\left ( \displaystyle \sum_{i=1}^r (o_{t_i} -\overline {o} ) (y_{t_i} -\overline {y} ) \right)^2  \over  \displaystyle \sum_{i=1}^r (o_{t_i} - \overline {o})^2 \cdot \displaystyle \sum_{i=1}^r (y_{t_i} - \overline {y})^2 } \quad \mathrm {mit} \quad \overline {y} = \frac {1} {r} \displaystyle \sum_{i=1}^r y_{t_i} \cdot </math>
+
<math> R^2 = {\left ( \displaystyle \sum_{i=1}^r (o_{t_i} -\overline {o} ) (y_{t_i} -\overline {y} ) \right)^2  \over  \displaystyle \sum_{i=1}^r (o_{t_i} - \overline {o})^2 \cdot \displaystyle \sum_{i=1}^r (y_{t_i} - \overline {y})^2 } \quad \mathrm {with} \quad \overline {y} = \frac {1} {r} \displaystyle \sum_{i=1}^r y_{t_i} \cdot </math>
  
 
==Set evaluation criteria with the aid of JAMS Model Builder==
 
==Set evaluation criteria with the aid of JAMS Model Builder==

Latest revision as of 15:53, 9 April 2014

It is necessary to have evaluation criteria, to judge the efficiency of an simulation. Evaluation Criteria quantify the similarity of two timelines. Evaluationcriteria are used to check the quality of an simulation. Although they quantify the simularity of two timelines. That makes it an objective measure instrument for variance of a timeline when observated and on the other hand enable to make an evaluated comparisson of errorfunctions between multiple timelines with the focus on observation. It is an goal of automated calibriation to change modellparameters in that way, that the simularity of an simulated timeline to an observated timeline, get maximized. For this process are errorfunctions essential. There exists countless measures, to evaluate the modell performance, that execl through there individual pro and contras. Usually used measures are:


Contents

prozentualer Volumenfehler (PBIAS)

Measures the relative error between the simulated and the observated timeline. This measure shows, if the simulation follows a systematically over or unter rating.

   {V_{rel}(\theta)} = { {\sum_{i=1}^r (o_{t_i} - y_{t_i})} \over {\sum_{i=1}^r o_{t_i}}}

Nash-Sutcliffe-Effizienz(E2)

The Nash-Sutcliffe Effizienz E2 (Nash und Sutcliffe, 1970)

 E_2 (\theta) = 1 - { \sum_{i=1}^r (o_{t_i} -y_{t_i})^2 \over \sum\limits^{r}_{t=1} ( o_{t_i}- \overline o )^2 } \quad \mathrm { with} \quad \overline o =  \frac {1}{r} \sum\limits^{r}_{t=1}  o_{t_i}


is definated as one monus the sum of absolute quadratic aberration, that will be normalized with the variance of observation in the viewed period. The range of the Nash-Sutcliffe efficience reaches from minus till plus one. It has its maxiums if both timelines are in conform. An negativ value means, that the modell rated the observation less than the middlevalue of observation. E2 is used very often, but has some disadvantages. Aberrances are quadratic calculated, therefore are higher drainvalues stronger judged than lower drainvalues. But as an result it is perfectly used for evaluation of highwaterperiods and drainpikes (Legates and McCabe Jr, 1999). Schaefli and Gupta (2007) made a point, that E2 depends from the statistical spread of the timelines and therefore is only areaspecificaly. Thats why the Nash-Sutcliffe efficience can only be used restricted to compare different areas.

The modified Nash-Sutcliffe Effizienz E1

 E_1 (\theta) = 1 - { \sum_{i=1}^r \left |o_{t_i} -y_{t_i} \right | \over \sum\limits^{r}_{t=1} \left | o_{t_i}-\overline o \right | }


has simulare properties as the usual Nash-Sutcliffe efficience. The difference is that the aberrance is quantified with the usual absolute value function. So can be an overproportional weighting of drainpikes be denied. This error function provides an balanced evaluation for the simulation (Krause et al., 2005).

logarithmic Nash-Sutcliffe-Effizienz (logE2)

 logE_2 (\theta) = 1 - { \sum_{i=1}^r \left |\ln o_{t_i} -\ln y_{t_i} \right |^2 \over \sum\limits^{r}_{t=1} \left |\ln o_{t_i}-\overline {\ln o} \right |^2 } \quad \mathrm { with} \quad \overline {\ln o} =  \frac {1} {r} \sum\limits^{r}_{t=1} \ln o_{t_i}

is used to make an logarithmic transformation of the values. So that higher values (e.g. drainpikes) are more flatened and lower values have more weight in the simulation. (Krause and Flügel, 2005).

Determinationvalue

The determinationvalue R2 measures the intensity of linear connections between observated and simulated timelines. The valuerange of this value is between 0 and 1. There is an perfect linear connection between observated and simulated timelines, if the value is 1 or 0. There is no linear connection if the value of R2 is 0. The determination value doesn't valuate two timelines for there quantity conformity. It only checks there dynamics are alike. It is not recommendet to valuate hydrographs alone with this measure (Krause et al., 2005). The determinationvalue is define as the square of the empiric correalationcoefficient of both timelines:


 R^2 = {\left ( \displaystyle \sum_{i=1}^r (o_{t_i} -\overline {o} ) (y_{t_i} -\overline {y} ) \right)^2  \over  \displaystyle \sum_{i=1}^r (o_{t_i} - \overline {o})^2 \cdot \displaystyle \sum_{i=1}^r (y_{t_i} - \overline {y})^2 } \quad \mathrm {with} \quad \overline {y} = \frac {1} {r} \displaystyle \sum_{i=1}^r y_{t_i} \cdot

Set evaluation criteria with the aid of JAMS Model Builder

You can find this option in the tab Configure Efficiencies. Now you should see the following window:

hier ist ein Bild
Gütemaße konfigurieren


Now you can set , depending on your requirements of your callibration, criteria. As an example we show you here an minimal configuration :
1. step: Click on the button "+" , to create a new criteria.
2. step: Specifiy your attributs, that you want to compare. Both attributs must be set in the same contextcomponent. First chose from the drop-down-list the needed context, second the measured/observed and simulated attribut.

3. step: Chose an attribut, which represents the aktuell timestep of the simulation. Often, it should be "TimeLoop.time". You should see a similar window like this:

hier ist ein Bild
Gütemaße konfigurieren

4. step: click on OK to integrate the evaluation criteria to your model.

Personal tools