In inverting canopy reflectance models against reflectance data, we
are usually interested in constrained optimisation - a special case of
the general optimisation problem where some variables may be fixed, or
lie between fixed limits. Figure 1 shows some of the difficulties this
may pose in the 1-D case (from Numerical
Recipes in C, Press *et al.*, p 395).

Figure 1 - points A, C and E are local, but not global maxima. Points B and F are local, but not global minima. The global maximum occurs at G, which is on the boudary of the interval so that the derivative of the function need not vanish there. At E, derivatives higher than the first vanish, which can cause difficulties for some algorithms. The points X, Y and Z are said to bracket the point F since Y is less than both X and Z i.e. Y is a "first guess" at a minima in this region. It is important to note that in the general case of N-dimensional optimisation (particularly if not using derivatives), the solution is reduced to a series of N (iteratively improving) minimisations in 1-D. As a result, nearly all the algorithms we will see can be reduced to the problem of finding the extrema of a 1-D function.

How do we choose which methods we use? We need to decide between methods
that need only to evaluate the function, against those which calculate
(partial) derivatives. Derivative methods may often be more powerful in
terms of finding global minima (assuming functions are well-behaved and
differentiable). Derivative methods such as the conjugate gradient algorithms
of Fletcher-Reeves and Polak-Ribiere, and the well-known quasi-Newton algorithms
of Davidon-Fletcher-Powell (DFP) and Broyden-Fletcher-Goldfarb-Shanno (BFGS)
are widely used in practice due to their robustness. Derivative methods
however require many more function evaluations than is desirable for rapid
inversion and also require *a priori* knowledge of the behaviour of
the function. Because of this, we will concentrate on methods not requiring
calculation of derivatives. Another computational issue is that of storage:
we need to decide between methods requiring storage of order N (N is the
number of dimensions of the minimisation problem), and order N^{2}.

Figure 2 (see Press *et al.*, p 398) - successive bracketing of
a minimum. The minimum is originally bracketed by points 1,3, and 2. The
function is evaluated at 4, which replaces 2; than at 5, which replaces
1; then at 6, which replaces 4. The rule at each stage is to keep a center
point that is lower than the two outside points. After the steps shown,
the minimum is bracketed by points 5,3, and 6.

A minimum is bracketed when there is a triplet of points a < b < c (or c < b < a) such that f(b) < f(a) and f(c). In Figure 2, points 1,3 and 2 equate to points a, b and c i.e. f(3) < f(1) and f(2). We can successively improve this bracketing by performing the following steps:

In each case, the middle point of the new triplet is the best minimum
achieved so far. This process is repeated until the distance between the
two outer points becomes sufficiently small (i.e. approaches some set limit)
- where sufficiently small means approximately the square root of machine
floating point precision.

An important question you should be asking is "How do we choose the new point x each time?" ......

Figure 3

If we say that b is some fraction w of the way between a and c then w = b-a/c-a, and, consequently, 1-w = c-b/c-a. If the new point, x, is some distance z beyond b, then z = x-b/c-a. The next bracketing segment will therefore either be of length w+z relative to the current one, or of length 1-w. To minimise the worst case, then we can choose z to make these equal i.e. z = 1 - 2w. This gives us the symmetric point to b in the interval a to c i.e. |b - a| = |x - c|. This implies that x lies in the larger of the two segments ( z > 0 only if w < 1/2).

How does the value of w get chosen? It must come presumably from the
previous stage of applying the same strategy, so if z is chosen to be optimal,
then so was w. This scale similarity implies that z should be the same
fraction of the way from b to c (if that is the bigger segment) as was
b from a to c i.e. w = z/(1-w). We can use these two equations in w and
z to form the quadratic equation w^2 - 3w +1 = 0 yielding w ~ 0.38197,
and 1-w ~0.61803. Therefore the optimal bracketing interval (a,b,c) has
its middle point b a fractional distance 0.38197 from one end, and 0.61803
from the other. These are the so-called *golden mean* fractions.

In each step of our *golden section search* therefore, we select
a point a fraction 0.38197 into the larger of the two intervals (from the
central point of the triplet). An additional feature of this method is
that if our initial triplet has segments not in the golden ratios, the
procedure outlined above will quickly converge to the proper self-replicating
ratios.

Figure 4 - convergence by inverse parabolic interpolation (see Press
*et al.*, p 403)

This method is known as *inverse parabolic interpolation*, as we
want to find a point x at which our function y=f(x) is a minimum (not just
a value of y). The formula for the abscissa x that is the minimum of a
parabola through three points f(a), f(b) and f(c) is

x = b - 1/2 * (b-a)^2*(f(b) - f(c)) - (b-c)^2*(f(b)-f(a)) ------------------------------------------------- (b-a)*(f(b)-f(c)) - (b-c)*(f(b)-f(a))This equation fails only if the three points are collinear. However, it is just as likely to jump to a maximum as a minimum, and somehow this tendency has to be overcome.

A technique likely to be both stable *and* efficient is one which
will use a slow-but-sure method such as the golden section search in unfavourable
areas, but can also switch to parabolic interpolation when it comes sufficiently
close to a minimum. In this way, we can be sure we are apporaching a minimum
rather than a maximum (or vice versa) before switching to the more efficient
parabolic method. The main difficulties of this approach are in finding
a quick, robust scheme for deciding which of the two techniques is likely
to be favourable, in addition to being careful not to get into problems
when we are very near to a solution and the function is being evaluated
close to the machine precision mentioned earlier.

*Brent's method* is such a scheme. At each stage, it keeps track
of six function points, a, b, u, v, w, and x. The minimum is bracketed
between a and b; x is the point with the least function value found so
far (or the most recent in the case of a tie); w is the point with the
second least function value; v is the previous value of w; u is the point
at which the function was evaluated most recently.

Parabolic interpolation is tried through the points x, v, and w. To
be accepted as a successful move, the parabolic step must fall between
a and b *and* imply a movement from the best current value, x, that
is *less* than half the movement of the *step before last*. This
second critetion insures that the parabolic steps are actually converging
to something. In the worst possible case, where the parabolic steps are
acceptable but useless, the method will approximately alternate between
parabolic steps and golden sections, converging in due course by virtue
of the latter. Press et. al. explain the reason for choosing the step *before*
last (as opposed to the previous step) as purely heuristic i.e. through
experience, it seems better not to 'punish' the algorithm for taking one
bad step if it can make it up on the next one.

*Downhill Simplex Method (aka amoeba or Nelder-Mead)*

This is the only multidimensional minimisation method we will come across
here which does not make use of a 1-D minimisation algorithm like the ones
described above. The *downhill simplex method* (due to Nelder and
Mead), is a method requiring only function evaluations, not derivatives.
It is a relatively simple method, and although not particularly efficient
(in comparison to Powell, which we will come across later), it can often
be the most robust.

A simplex is the geometrical figure consisting, in N dimensions, of N+1 vertices and all their interconnecting line segments, polygonal faces etc. In 2-D this is a triangle, in 3-D a tetrahedron. If any point of the simplex is taken as the origin, then the other N points define vector directions that span the N-dimensional vector space. There is no analogous bracketing method (as we saw in 1-D) in multidimensions. All we can do is give the algorithm a starting guess i.e. an N-vector of independent variables to try as the first point. The algorithm is then supposed to make its own way "downhill" (in function space) until it finds a minimum. Figure 5 illustrates this idea.

Figure 5 - possible outcomes for a step in the downhill simplex method
(Press *et al.*, p 409).

If we consider the starting point, in N dimensions we must define N+1
points, which define the initial simplex. If we define one of these points
as **P**_{o} then the other N points are simply

**P**_{i} = **P**_{o} + k**e**_{i}

where the **e**_{i}'s are N unit vectors, and where k is
a constant which is our guess of the problem's characteristic length scale.
We proceed by taking a series of steps away from the starting point, and
seeing if these steps take us to a more favourable position (closer to
a minimum). The steps available are (a) a reflection away from the high
point (b) a reflection and expansion away from the high point (c) a contraction
along one dimension from the high point or (d) a contraction along all
dimensions towards the low point. When it can, the algorithm expands the
simplex in one or other of the directions, to take larger steps. When it
reaches a "valley floor" (think of a long, steep-sided valley sloping away
from us), it contracts itself in the transverse (cross-valley) direction
in order to ooze down the valley. If these steps are not available, the
simplex can contract around its lowest (best) point.

How do we know when we have reached a minimum? This can be a difficult question in any multidimensional minimisation algorithm. The sensible thing to do is to see whether one of two criteria are satisfied, namely (i) identify one "step" of the algorithm - when the vector distance moved in one step is less than some tolerance, then stop, or (ii) require that the decrease in the function value in one step be fractionally smaller than some defined tolerance.

A final warning is that both these termination criteria can be fooled by a single, anomalous, step which doesn't go anywhere. To avoid this, it is a good idea to restart the any multidimensional minimisation algorithm from the point where it claims to have found a minimum. If it "really" has, this will not be a costly move, and if it hasn't, well ......

Let us look at the simplest *direction set* method. If we take
the unit vectors **e**_{1}, **e**_{2},....**e**_{N}
as a *set of directions*, we should be able to cycle through the directions,
using our 1-D line minimisation methods to minimise along each direction
in turn, before going on to the next. This is a good method in some cases,
but figure 6 shows where this method struggles.

Figure 6 (see Press *et al.* p. 414)

In the case of a long, narrow valley, not aligned with our chosen axes
(X and Y in this case), this method has to follow our base axes, and so
must take many small steps to crawl towards the minimum. In this case,
a more sensible choice of directions is needed. All direction set methods
consist of recipes for updating the set of directions as the method progresses,
attempting to come up with a set which either (i) includes some very good
directions that will take us fast along narrow valleys (e.g. at 45^{o}
to X and Y in figure 6), or else (ii) includes some number of "non-interfering"
directions with the special property that minimisation along one is not
"spoiled" by subsequent minimisation along another, so that we do not have
to contiually cycle through all directions.

Powell's method does just this. These "non-interfering" directions are
more formally defined as conjugate directions, and Powell's method produces
N mutually conjugate directions along which to minimise. First, we initialise
a set of directions **u**_{i} = **e**_{i} where i=1,....,N.
Next we repeat the following steps:

This method is illustrated in figure 7.

Figure 7 - Powell's conjugate direction set method. The first steps
move to a new point, P, and line minimisation continues from P along the
new set of directions x' and y which are x' is the resultant direction
after our first step. In this case, it is clear that ideally we should
be minimising along the axes x_{ideal} and y_{ideal}, but
even after one step, our new set of directions is far closer to these than
originally, which is promising.

It can be demonstrated that N iterations of the basic procedure i.e.
N(N+1) line minimisations will exactly minimise a quadratic form. There
is, however, one major drawback to Powell as stated - the procedure of
throwing away, at each stage, **u**_{1} in favour of **P**_{N}
- **P**_{o} tends to produce sets of directions that "fold up
on each other" i.e. they all start to point in the same way! If this happens,
then the algorithm converges only to a minimum of the function over some
subsapce of the full N-dimensional case, giving the wrong answer. To avoid
this, we can try a number of things:

In practice, any of these adjustments to the basic Powell method
may be used, with each having slight differences in performance under different
situations.

Figure 8 - schematic representation of a function with several local minima, A, B, C, and D, and a global minima at E.

Simulated annealing is so named because it has a direct thermodynamic analogy with the annealing, or slow cooling, of metals, or the crystallisation of liquids. At high temperatures, the molecules of a liquid move freely with respect to each other. If the liquid is cooled slowly, thermal mobility is lost, and the atoms are often able to line themselves up and form a pure crystal that is completely ordered over a distance of up to billions of times the size of an individual atom in all directions. This crystalline state is the state of minimum energy of the system. For slowly cooled systems, Nature is able to find this state every time. If a liquid metal is quenched, or cooled quickly, it ends up in some polycrystalline or amorphous state having much higher energy (this situation is partly analogous to the other minimisation processes we have considered). How is Nature able to solve such an incredibly complex optimisation problem, seemingly with no effort, when we are struggling to solve problems requiring minimsation of only a few parameters?

The basic process is one of slow cooling, allowing time for the atoms to redistribute themselves as they lose mobility (energy). The annealing process is based on the very important Boltzmann probability distribution

The algorithm for simulated annealing is implemented in the following
way. Offered a succession of options, a simulated thermodynamic system
is assumed to change its configuration from energy E_{1} to E_{2}
with probablity **P** = exp[-(E_{2} - E_{1})/kT]. If
E_{2} < E_{1}, **P** > 1. In this case, **P**
is arbitrarily assigned the value 1 i.e. the system *always* took
such an option. However, if E_{2} > E_{1} i.e. the option
is to go to a higher energy state, P is postitive, but less than 1 i.e.
there is a small but finite probability that the system may take this option.
Hence our system will *always* take a downhill step that is offered
to it, but will *sometimes* take an uphill one if offered that.

In practice, we need to provide the following elements if we wish to use this algorithm:

Figure 9 illustrates this procedure.

Figure 9 - a 1-D function with 6 separate energy maxima and minima. In the annealing process, going from E1 to E2, E3 to E4 and E5 to E6 are all downhill steps, and so have a probability of 1. Going from E2 to E3, and from E4 to E5 are =uphill steps, and consequently have a small but finite probability of occurring. This small probability is what controls the possibilty of going from the local minima at E2 and E4, to the global minima at E6. Other minimisation algorithms that we have looked at do not possess this ability.

The simulated annealing idea is applicable to the optimisation of continuous
N-dimensional functions, f(**x**), where **x** is an N-dimsensional
vector. The four elements required by the above procedure are now as follows:
the value of f is the objective function. The system state is the point
**x**. The control parameter T is, as before something like a temperature,
with an annealing schedule by which it is gradually reduced. There must
also be a generator of random changes in the configuration i.e. a procedure
for taking a random step from **x** to **x** + d**x**, which are
the new "options" available to the system. In practice, this may be the
most difficult of the steps to implement, and a number of solutions have
been proposed for this (see Press *et al.*, p 451).

All applications of simulated annealing suffer from the difficulty of defining the rate of cooling. How slow is "sufficiently slow"? The success of the approach may be controlled by this factor. There are a number of possibilities worth trying:

Simulated annealing has been applied in practice to solve a classic minimsation problem known as the "Travelling Salesman" problem (see Press *et al.*, p 445-450). The aim is to find the shortest possible route for a travelling salesman visiting N arbitrarily located cities in turn before returning to the starting point (visiting each city only once). This kind of problem is known as an NP-complete problem, and computation time for an exact solution rises proportionally to e^{kN} where k is some constant. As N increases, the time taken to try all solutions rapidly becomes prohibitive. Simulated annealing solves this with the number of calculations only scaling as a small power of N. This has made simulated annealing particularly useful in specialised problems such as that of locating components during chip design in order to minimise heat generation and interference.

There are a variety of other intriguing methods of multidimensional function optimisation that have arisen from i) the need to find sophisticated methods to solve a huge range of optimisation problems in a whole range of mathematics, physics, engineering and modelling applications, many of which do not surrender easily to the "standard" methods mentioned above ii) increases in computing power that have pushed development of more sophisticated algorithms iii) the observation of the way Nature has created very elegant ways of solving seemingly impossible optimisation tasks.

**(Artificial) Neural networks
(ANN)**

These methods are based on the observation that bioligical neural networks (i.e. brains) are exceptionally good at solving incredibly complex non-linear optimisation problems. They can do this by "training", system responses to behave according to previously experienced situations, and adapting these responses to new situations.

There is no universally accepted definition of an ANN, but a general
definition is that an ANN is a network of many simple processors ("units"),
each possibly having a small amount of local memory. The units are connected
by communication channels ("connections") which usually carry numeric (as
opposed to symbolic) data, encoded by any of various means. The units operate
only on their local data and on the inputs they receive via the connections.
The restriction to local operations is often relaxed during training.

This figure schematically illustrates the structure of a simple feed-forward neural net with only one hidden layer, and one output (the figure comes from http://www.cbs.dtu.dk/services/SignalP/methods_nn.html).

The neurons have several inputs but only one output. The output from a neuron is calculated as a function of a weighted sum of the inputs. A threshold can be set so that the neuron can have a non-linear output response - if the signal is less than the threshold, the output is zero, greater than the threshold and the ouput is 1, or the weighted sum of the inputs. In an ANN, neurons are connected (with the output of one neuron being the input of another neuron). When the ANN is presented with an input (*e.g.* a set of reflectance data) it will calculate an output that can be interpreted as a classification of the input (*e.g.* the model parameter values that result in this reflectance). It is possible to ``teach'' a neural network how to make a classification by repeatedly presenting it with a set of known inputs (the *training set* - e.g. directional reflectance data sets) and simultaneously modifying the weights associated with each input in such a way that the difference between the desired output (e.g. measured reflectance) and the actual output (e.g. modelled reflectance) is minimised according to some measure such as RMSE. The ability of the ANN to change the output to reflect the input is important, this ability gives the neuron the capability to learn. The more (distinct) training data we can throw at the ANN, the more refined the behaviour of the ANN should become i.e. better able to distinguish between finer variations in the input data. This is achieved via back propagation training.

A general Neural Network algorithm is as follows:

ANNs have a number of advantage of solving the types of non-linear optimistion problems we have been considering: they can be used where we can't formulate an algorithmic solution, where we can get lots of examples of the behaviour we require or where we need to pick out the structure from existing data. A major drawback is that when we change the nature of the training data e.g. different angular or spectral sampling, then the ANN must be retrained. It suggests that ANN inversion of multispectral data might be a possibility for fixed geometry, multiple waveband sensors (e.g. TM) but is not so useful in the general case (arbitrary viewing and illumination geometry). There are a number of examples of the application of ANNs to the problem of inverting biophysical parameters from optical and microwave data of vegetation canopies e.g. Chuah, 1993; Pierce et al., 1994; Gopal and Woodcock, 1996; Jin and Liu, 1997; Abuelgasim et al., 1998. In addition, ANNs have proved very promising for the classification of EO data, where supervised methods (requiring training data) are already the norm e.g. Foody, 1997a and 1997b.

An excellent introduction to ANNs is at
Stirling University with many external links. For more information
on neural nets see the page mentioned above, or the Neural
Net FAQ .

**Genetic
(or evolutionary) algorithms (GAs) **

It has long been noted that Nature manages to solve THE optimisation problem, i.e. maximisation of species survival and propagation, through genetic mutation and the subsequent favouring of a few of these mutations in the long term. If we can phrase our optimisation problems in terms of some non-linear "fitness-for-survival" function, genetic mutation and propagation models can be applied to these problems in order to find optimal solutions. Effectively a GA is a method by which some description of the function state (known as a "string" - equivalent to a gene pattern) is changed in some way according to a probabilistic model (typically crossover, mutation or inversion). In the case of a BRDF model, this would be the N-D vector representing the current state of the model parameters. This mutation is then propagated to the subsequent generation in some predefined way, such as "mating" - the recombination of two parent strings to create an offspring string. Finally, the new generation "mutation" is tested to see if it better fulfils our survival criteria (or has a lower RMSE in terms of model fit). In this way we can see that the GA progresses (as in Nature) through mutation and selection via sexual reproduction, characterised by recombining two parent strings into an offspring.

A general Genetic Algorithm is as follows (see http://osiris.sunderland.ac.uk/cbowww/AI/TEXTS/ML2/sect4.htm) for an illustrated example of this):

GAs differ from normal optimisation methods in four ways:

- They work with a coding of the parameter set, not the parameters themselves
- They search from a population of points, not a single one
- They use 'payoff' information (some objective function) - not derivatives or other auxilliary knowledge.
- GAs use probabilistic transition rules, not deterministic ones (c.f. simulated annealing).

The most basic GA is as follows:

Some advantage of GAs are their flexibility and power - they may
be able to solve problems characterised by very many small, ill-defined
minima, the sort of problem that is typically almost impossible to solve
by standard methods. Another major advantage is that they are intuitively
understandable. The difficulty of using GAs practically is that they combine
two different search strategies: random search by mutation and a biased
search by recombination of the strings contained in a population, and defining
the balance between them is by no means straightforward. More relevant to problems
in CR model inversion is the fact that it may take a huge number of generations (iterations) for such a system to converge to a solution.

I have donwloaded several useful references to GAs, of which there are copies in the RSU. Also, see Goldberg (1988), MATHS NA20 GOL in the DMS Watson library.

**Knowledge-based systems (KBS)**

KBS comprise a variety of methods seeking to solve complex computational problems using sources of information external to the problem itself. In CR model inversion, work done on KBS attempts to use an understanding of the inversion problem to drive the optimisation process. An example in vegetation modelling is the work of Kimes and Harrison (1990) and Kimes *et al.* (1991) who developed VEG, a system for inferring physical and biological surface properties from reflectance data sets. VEG integrates the traditional spectral (directional) data with diverse knowledge bases derived from literature, field-measured directional reflectance data sets and information from human experts into an efficient system for inferring biophysical parameters from reflectance data. Significant devlopmental work was done on KBS in the late 80s - early 90s but the major problem of such systems, namely how the knowledge is defined and input into the system, has never been overcome satisfactorily. This approach has been largely taken over by ANNs, where the decisions as to what response to a particular input is used is concealed in the 'black box' hidden layer(s) of neurons. The drawback of this method of course is that no knowledge of how the problem is being solved within the black box is available - something that is made explicit in the KBS approach. A more promising approach for CR model inversion than either ANN (limited by training data) or KBS (how do we define knowledge data?) is that of look-up tables (LUT).

**Look up table approaches (LUT)**

The concept behind the LUT approach as applied to CR model inversion is to separate the calculation of canopy reflectance from the problem of finding the optimum solution (model parameter set) to a specific model inversion. A set of discrete CR scenarios can be pre-computed e.g. according to some geometrical/spectral criteria (sampling regime), and the corresponding parameter values are then stored in the LUT. The optimisation task is then to find the required values in the LUT as quickly as possible, given a reflectance data set. Knowledge of external parameters e.g. VIs, cover type, viewing/illumination geometry etc. are then used to limit the search space in as many ways as possible. Closest matching parameters can then be pulled out of the LUT yielding the inverted model parameter set. The use of external information (knowledge) to limit the search space implies that LUT methods are closely related to KBS methods. LUT methods have a simpler requirement for the use of external data however, and can be applied quite simply in practice e.g. MODIS LAI/FPAR algorithm (see MOD15 LAI/FPAR ATBD, (.pdf version - view using acroread)). An important advantage of the LUT approach is that all the computational donkey work i.e. the repeated running of the forward model is done prior to parameter retrieval and is therefore not a run-time operation. In addition, as developments and improvements are made to the underlying CR model, the LUT can be recomputed and updated accordingly.

A number of minimisation (more generally, optimisation) techniques that can be used to derive biophysical parameter information via the inversion of CR models against reflectance data have been introduced. A brief summary of these, including some of the pros and cons in each case is given below.

Abuelgasim, A. A., Gopal, S. and Strahler, A. H. (1998) Forward and inverse modelling of canopy directional reflectance using a neural network, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1998, Vol.19, No.3, pp.453-471.

Brent, R. P., 1973, Algorithms for Minimisation without Derivatives (Englewood Cliffs, NJ: Prentice-Hall).

Bunday, Brian D. (1984) Basic Optimisation Methods (Edward Arnold, London).

Chuah, H. T. (1993) An artificial neural-network for inversion of vegetation parameters from radar backscatter coefficients, JOURNAL OF ELECTROMAGNETIC WAVES AND APPLICATIONS, 1993, Vol.7, No.8, pp.1075-1092.

Clark, C. and Canas, T. (1993) A genetic algorithm approach approach to spectral identification, Proc. RSS '93, Chester.

Foody, G. M. (1997a) Fully fuzzy supervised classification of land cover from remotely sensed imagery with an artificial neural network, NEURAL COMPUTING & APPLICATIONS, 1997, Vol.5, No.4, pp.238-247.

Foody, G. M. (1997b) An evaluation of some factors affecting the accuracy of classification by an artificial neural network, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1997, Vol.18, No.4, pp.799-810.

Gopal, S. and Woodcock, C. (1996) Remote sensing of forest change using artificial neural networks, IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 1996, Vol.34, No.2, pp.398-404.

Goldberg, D. E. (1988) Genetic algorithms in search, optimization and machine learning, Addison-Wesley, 1988. - 0-201-15767-5.

Jin, Y. Q. and Liu C. (1997) Biomass retrieval from high-dimensional active/passive remote sensing data by using artificial neural networks, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1997, Vol.18, No.4, pp.971-979.

Kimes, D. S., Harrison, P. R. (1990) Inferring hemispherical directional reflectance using a knowledge based system, Proc. IGARSS '90, Washington, p. 1759-1764.

Kimes, D. S., Harrison, P. R. and Ratcliffe, P. A. (1991) A knowledge-based expert system for inferring vegetation characteristics, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1991, Vol.12, No.10,pp.1987-2020.

Kimes, D. C. and Nelson, R. F. (1998) Attributes of neural networks for extracting continuous vegetation, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1998, Vol.19, No.14, pp.2639-2663.

Pierce, L. E., Sarabandi, K. and Ulaby, F. T. (1994) Application of an artificial neural-network in canopy scattering inversion, INTERNATIONAL JOURNAL OF REMOTE SENSING, 1994, Vol.15, No.16, pp.3263-3270.

Press, W. H., Teukolsky, S. A., Vetterling, W. T. and Flannery, B. P., (1994) Numerical Recipes in C (CUP, 2nd ed., pp. 994).

Privette, J. L. (1994) An efficient strategy for the inversion of bidirectional reflectance models with satellite remote sensing data, Ph.D. thesis, University of Colorado (postscript version - view using pageview, 3.12MB).

Take me home..