Introduction

The use of optimization methods has been a standard in mechanical engineering design for some time. Optimization methods can be used to perform several tasks. One of the most common tasks is to minimize the cost of a product. The engineer wants a product that functions efficiently and is also economically efficient. Another kind of optimization task might be to build a model to simulate some function. An example of this is a four-bar path generation problem. The four-bar model should simulate the desired path as close as possible.

Many researchers have used optimization methods in this sense. There is an unlimited number of problems in which optimization methods can be applied. The objective of all optimization problems is to find a minimum or maximum objective function value. Considering minimum values, usually a problem may have more than one minimum objective function value. It is the engineer's goal to find the values of the design variables which will yield the lowest objective function value of all the minimums. The design variables are those factors that influence the objective function. Some times the engineer is not satisfied with a local minimum.

There are many traditional deterministic algorithms available to solve optimization problems for a local minimum. Some of these methods include the reduced gradient (Wolfe, 1963), the generalized reduced gradient, GRG (Abadie and Carpentier 1969), and the gradient projection (Rosen 1960 & 1961) methods. There are many other algorithms available in addition to these few. The above methods all require the evaluation of gradient information in order to solve the problem. Gradient evaluations can become difficult and time consuming when complex objective functions are present. These methods always look for the closest optima weather it is a local or global one. The algorithms find a valley and converge to the bottom of it.

The set of methods discussed in this paper are known as probabilistic global methods. By probabilistic it is meant that the algorithms will sometimes accept a bad result according to a given probability. Thus these methods are also referred to as "hill climbing" methods in the sense that they can climb out of a local minimum valley. The aforementioned deterministic methods will never accept a worse off solution and thus can not climb out of a valley.

Simulated Annealing (Kirkpatrick et. al., 1983) and Genetic Algorithms (Holland, 1975) are used in this paper. Both of these are probabilistic global optimization methods which do not require gradient information. A third global optimization technique proposed by Kota and Chiou (1992) combine the use of Orthogonal Arrays with the GRG algorithm. These three global optimization techniques are illustrated and compared in this paper using a simple model problem.

Model Problem

A model problem is utilized to demonstrate how each method solves or attempts to solve for a global minimum. A relatively simple problem was chosen to illustrate the optimization algorithms. A simple problem was beneficial since the accuracy of each method was being examined. It was desired that the actual global minimum be known. This way one could determine if the algorithms were in fact operating correctly and solving for the global minimum. A multimodal problem was desired to demonstrate that the algorithms could escape local minimums. That is a problem with one global minimum and a couple of local minimums. The problem chosen, known as Himmelblau's Function, is:



where x1 and x2 are the design variables. Thus, this problem contains two design variables with an objective function value of f(x1,x2). This is an unconstrained problem in which the objective function is to be minimized. Figure 1 below is a contour plot of Himmelblau's

function. In this figure four distinct minima can easily be seen. Solving the function analytically, three local minima and one global minimum are found. The true global minimum is located at f(3,2)=0. Of course, for complex problems an analytical solution is near impossible. The analytical solution will be used to demonstrate how closely the different global optimization techniques converge to the true global minimum at f(3,2).


Algorithms

Simulated annealing and genetic algorithms were used to optimize Himmelblau's function. Orthogonal arrays with GRG were a third technique implemented. Since an unconstrained problem is assumed, there is no need to consider such factors as penalty functions.

Simulated Annealing

Simulated annealing receives its name because of its resemblance to the natural process of the annealing of solids. When a solid material is annealed, it is heated to its melting point. When this point is reached, the solid is then allowed to cool slowly enough so that thermal equilibrium is maintained. By cooling at this slow rate, the atoms in the solid material arrange themselves as to attain a minimum energy state. This phenomenon of nature was first applied to numerical calculations in 1953 by Metropolis and his coworkers. Kirkpatrick (1983) applied this technique to engineering design. He used simulated annealing methods on the optimization of VLSI circuit designs.

In order to use simulated annealing, an initial design is chosen. Since this is a global optimization method, the starting point should not influence the result of the algorithm. A new design is then generated from the neighborhood of the initial points. If the new design is better, in terms of the objective function, than the old one, it is accepted. If it is worse, it may or may not be accepted. This decision is made according to the Boltzmann Probability function:


where E is the change in the objective function and T is the "temperature." If a randomly generated number between 0 and 1 is less than the Boltzmann Probability, the worse design is accepted. The "temperature" is a control parameter and sets the Boltzmann Probability. A low temperature is similar to quenching a solid, in which the lowest energy state is not achieved due to a fast cooling rate. The simulated annealing algorithm would then solve for the closest minimum to the starting point. However, with a high starting temperature, the algorithm looks over the whole objective function for a global minimum.

Several parameters were chosen for the model problem presented. The cooling schedule used was adapted from Vanderbilt and Louie (1984) and is shown below. In this


equation T0 is the current temperature, TNEW is the new temperature, and xt is the temperature reduction factor. In order to reach "thermal equilibrium," the temperature must first be held constant for a number of algorithm iterations. Vanderbilt and Louie (1984) suggest performing [15 * number of design variables] iterations at the starting temperature. Thus, the parameters for the simulated annealing algorithm are shown below.


The temperature was varied experimentally between the values shown above. The simulated annealing algorithm was implemented using the FORTRAN subroutine ambesa (Press et.al., 1994). A FORTRAN program was written to drive this routine.

Genetic Algorithms

Genetic algorithms (Holland, 1975) are different from traditional methods in that they are a probabilistic method that uses a population of designs rather than a single design at a time. Holland's genetic algorithms are based on the natural selection process in the evolution of living organisms. In nature, the fittest members in a population have the greatest probability of surviving. This is customarily known as "survival of the fittest." Since starting designs are irrelevant, initial population members are chosen randomly.

The FORTRAN code written for this problem used binary strings to represent the population members. Included in these binary strings are the "genes" or characteristics of a design variable on the objective function. To evaluate each member, a fitness function is implemented. This fitness function is based on the problem objective function, equation 1. In order to prevent one member from dominated the population, a linear normalized fitness function was used. Members for the next generation are chosen based on these fitness values. A member with a greater fitness value will have a higher probability of being chosen for the next generation than one with a lower fitness value.

The genetic algorithm used in this project used crossover and mutation techniques. Crossover is the exchange of bits of string between parents to create children. Mutation occurs when a bit in a string changes randomly from 0 to 1 or 1 to 0. Both operators are given a certain probability which will cause them to occur. This is shown below in Fig. 2.
Population = 50 Members
Mutation Rate = 10 %
Crossover:
Standard Deviation
Crossover Probability
< 0.75
30 %
> 0.75
70 %

Figure 2: Genetic Algorithm Parameters

Orthogonal Arrays with GRG

This global optimization technique used orthogonal arrays to find a suitable starting point for the generalized reduced gradient method. The technique was applied by Kota and Chiou (1992) to a four-bar path generation problem. The first step here is to choose an orthogonal array. The array chosen was a L9 (Taguchi, 1989) inner array. This array requires nine different evaluations of the objective function as shown below in Table 1. In the array, columns 1 and 2 contain three levels of the design variables. No
Evaluation
(1)
(2)
(3)
(4)
Evaluation
Number
x1
x2
F(x1,x2)
1
-6
-6
F1
2
-6
0
F2
3
-6
6
F3
4
0
-6
F4
5
0
0
F5
6
0
6
F6
7
6
-6
F7
8
6
0
F8
9
6
6
F9

Table 1: L9 Orthogonal Array

single combination is repeated in the table. Columns 3 and 4 are blank since only two design variables are present. F is the objective function.

To determine which level is the best for each design variable, the level effects are considered. For example, consider x2. The effect of level 1 on x2 is F1+F4+F7, level 2 is F2+F5+F8, and level 3 is F3+F6+F9. Level effects on x1 are calculated in the same manner. Lets assume level 1 returns the smallest F. The new levels would then be:

(level 1)new = (level 1)old

(level 2)new = 0.5*((level 1)old + (level 2)old) (5a,b,c)

(level 3)new = (level 2)old

If level 2 is the smallest:

(level 1)new = (level 2)old - 0.5*((level 2)old - (level 1)old)

(level 2)new = (level 2)old (6a,b,c)

(level 3)new = (level 2)old + 0.5*((level 3)old - (level 2)old)

If level 3 is the smallest:

(level 1)new = (level 2)old

(level 2)new = 0.5*((level 2)old + (level 3)old) (7a,b,c)

(level 3)new = (level 3)old

The above procedure is continued until either the objective function is less than 0.01 or the maximum difference in effects is less than 0.005 for each design variable. To implement this method a FORTRAN program was written. The results from the orthogonal array were used as starting points for the GRG algorithm. The optimization package OPTDES (Parkinson et.al.,1992) was used to implement the GRG algorithm.


Results and Discussion

Each of the above global optimization methods were used on the sample problem. The results for each method are now discussed followed by a comparison.

Simulated Annealing

Equation 4 of the previous section showed the various starting conditions which were used for the simulated annealing algorithm. Table 2 contains the results for several trials of the simulated annealing algorithm. Being a probabilistic method, simulated
Starting Point
Starting
Constant
Percent Time
x1
x2
Temp.
Iter #
Optimum Reached
6
6
10
30
90.91
6
6
1000
100
54.55
6
6
1000
30
45.45
6
6
1000
30
36.36
6
6
100000
30
36.36
6
6
1000
10
27.27
6
6
1.00E+07
30
27.27
6
6
1000
60
27.27
6
6
10000
30
18.18
6
6
100
30
9.09
0
0
1.00E+09
30
71.43
0
0
10
30
63.64
0
0
100
30
63.64
0
0
1000
30
45.45
0
0
1.00E+07
30
45.45
0
0
1.00E+07
100
42.86
0
0
1000
10
36.36
0
0
1000
100
36.36
0
0
1000
30
27.27
0
0
1000
1
0.00
-6
-6
1.00E+11
30
63.64
-6
-6
1000
100
54.55
-6
-6
1.00E+09
100
45.45
-6
-6
1000
30
36.36
-6
-6
1.00E+07
30
36.36
-6
-6
1000
30
18.18
-6
-6
1000
1
9.09
-6
-6
10
30
0.00

Table 2: Simulated Annealing Results

annealing will sometimes converge to a local minimum instead of the global minimum. All three conditions, starting point, starting temperature, and constant iterations at the initial condition were varied. For each set of conditions in Table 2, twenty tests were run. For every test run one of the four minima was reached. The column labeled "Percent Time Optimum Reached" is the percentage in which the global optimum was reached for the respective conditions. The simulated annealing algorithm was not very successful in solving for the global minimum. The percentage of success for a realistic condition was only about 50%. For a real life engineering problem, it would not be wise to choose a low starting temperature since it is not desirable to cool fast and quench. These two plots use two different starting points and compare the "percent time global optimum reached" with temperature.

Genetic Algorithms

The genetic algorithm was implemented on Himmelblau's Function, Equation 1, as per the conditions discussed previously and summarized in Figure 2. A population of 50

members was chosen randomly and the algorithm implemented. A summary of the results is shown on the following page in Table 3 which shows the five best members in each generation. For the initial 50 members, Generation #0, Himmelblau's function, f(x1,x2), had relatively large values. Optimally, this value should be as close to zero as possible and x1 and x2 should be equal to 3 and 2 respectively. The initial generation contains one local minimum and the rest of the members are not near any minimum. The fact that one of the members is near a local minimum is just coincidence since these values were chosen randomly. As the generations evolved, the members approached the global minimum. Since an elitist parameter was not used in this algorithm, the initial local minimum is not
Number of
Function Parameters
Generations
x1
x2
f (x)
0
9.74
10.10
18980.00
0
7.87
9.90
12703.00
0
7.57
-9.66
9422.00
0
9.81
5.68
9231.00
0
-9.55
5.45
7420.00
1
0.76
3.26
51.45
1
0.76
3.26
51.45
1
0.76
3.26
51.69
1
1.59
2.07
67.25
1
1.57
2.01
71.63
5
2.69
2.01
21.30
5
2.69
2.01
21.30
5
2.77
1.95
21.50
5
2.77
1.95
21.50
5
2.65
2.03
21.67
10
2.85
2.19
11.72
10
2.85
2.19
11.72
10
2.85
2.19
11.72
10
2.85
2.19
11.72
10
2.85
2.11
14.26

Table 3: Best Five Members for each Genetic Algorithm Generations

carried to future generations. After 10 generations, no change was seen in the majority of the members.

Orthogonal Arrays with GRG

For this method, orthogonal arrays were used to find a suitable starting point for the GRG algorithm of OPTDES. As previously mentioned, the initial levels of both x1 and x2 were -6, 0, and +6. These three levels are also referred to as levels 1, 2, and 3, respectively. Using equation 7 and the L9 array shown in Table 1, the results of the orthogonal arrays are shown below in Table 4. As per the termination criteria previously
Levels
Optimum Points
x1
x2
x1
x2
F(x)
1
1
2.9736
2.0625
6.31E-02
Initial Levels
1
2
2.9736
2.0640
6.80E-02
Level
Value
1
3
2.9736
2.0654
7.32E-02
1
-6
2
1
2.9751
2.0625
6.08E-02
2
0
2
2
2.9751
2.0640
6.59E-02
3
6
2
3
2.9751
2.0654
7.13E-02
3
1
2.9766
2.0625
5.91E-02
3
2
2.9766
2.0640
6.44E-02
3
3
2.9766
2.0654
7.00E-02

Table 4: Orthogonal Array Results

mentioned, the above values took twelve iterations to obtain. It is obvious that the values of x1 and x2 (2.97,2.06) are very close to the global minimum of (3, 2).

These values were now used as the starting points for the GRG algorithm. Table

5 contains these results. Six tests, or sets of starting points, are shown in the table. Test 1
Test
Starting Points
Analysis
Number of
Optimum Points Optimum Points
Number
x1
x2
Function
Calls
Designs
x1
x2
Function
1
2.97
2.07
0.063
18
5
3.00
2.00
0
2
-6
6
1490
26
4
-2.80
3.13
2.32E-05
3
6
6
2186
40
7
2.99
2.00
4.21E-05
4
-6
-6
890
33
6
-3.77
-3.28
7.78E-05
5
6
-6
1586
38
6
3.58
-1.85
2.86E-05
6
0
0
170
48
8
2.99
1.99
6.02E-06

Table 5: Results of GRG Algorithm

uses the starting points from the orthogonal arrays. Implementing GRG with these points yield the global minimum. This makes sense since the starting points are in the vicinity of the global minimum. The remaining tests use various starting points from the various initial levels (-6, 0, +6). Of these tests, test 3 and 6 also yield the global minimum, however, they cause GRG to take more time to complete. For example, in test 1, only 18 analysis calls and 5 designs are needed. For test 6, the GRG algorithm requires 48 analysis calls and 8 designs. Thus, the use of orthogonal arrays is more efficient in two aspects: they guarantee convergence on a global minimum, and second, they require less CPU time for the GRG algorithm to terminate.

Comparison

Now that the results of all three methods have been presented, it is time to compare them with each other. Table 6, located on the following page, contains each method and its results. Also included in this table is the solution of orthogonal arrays alone. Both orthogonal arrays with GRG and simulated annealing yield the global minimum, however, as previously discussed, simulated annealing does not return the global minimum 100 percent of the time. Therefore, for this example the best method
Method
x1
x2
f (x1,x2)
Arrays Alone
2.97
2.06
.0065
Arrays with GRG
3
2
0
*Simulated Annealing
3
2
0
Genetic Algorithms
2.85
2.19
11.72
Global Minimum
3
2
0

* Only reached a percentage of the time

Table 6: Comparison of Methods

to use is orthogonal arrays as a starting point for the GRG algorithm.

Conclusions

The model problem presented in this paper illustrated three different global optimization methods. Each method solved the simple problem in a different manner. Using orthogonal arrays as a starting point for the GRG algorithm did yield a global minimum. In fact, any conventional optimization routine such as SQP or SLP could have been used after the orthogonal arrays to produce a global minimum. Also, since genetic algorithms did not yield an exact answer but rather one in the neighborhood of the global minimum, conventional methods can be used to descend to the global minimum from this point. For this problem simulated annealing was not a reliable method. It is not yet determined if there is an error in the FORTRAN subroutine for simulated annealing. When solving an engineering problem. It is not desirable to know that an answer has only a 70 percent chance of being correct. The results obtained in this paper can be only stated confidently about the problem examined. For different problems, different methods may work better. That is why there are so many optimization techniques available. Future work may include applying some of the techniques discussed to solve real life engineering problems in which the answer is not yet known.

References

Abadie, J. and Carpentier, J., "Generalization of the Wolfe Reduced Gradient

Method to the case of Nonlinear Constraints," in Optimization, Gletcher, R.

(Ed.), Academic Press, New York, 1969.


Holland, J. H., "Adaptation in Natural and Artificial Systems," University of

Michigan Press, Ann Arbor, Michigan, 1975.

Kirkpatrick, S., Gelatt, C.D. and Vecchi, M.P., "Optimization by Simulated

Annealing," Science, Vol. 220, pp. 671-680, 1983.


Kota, S. and Chiou, S., "Use of Orthogonal Arrays in Mechanism Synthesis,"

Mechanical Machine Theory, Vol. 28, pp. 777-794, 1993.


Parkinson, A., Balling, R, and Free, J, "OPTDES," Brigham Young

University, 1992.


Press, W, Teukolsky, S, Vetterling, W, and Flannery, B, "Numerical Recipes

in FORTRAN," Cambride Press, Boston, 1994.

Recklaitis, "Engineering Optimization," John Wiley and Sons, New York.

Rosen, J. B., "The Gradient Projection Method for Nonlinear Programming Part I,

Linear Constraints, " SIAM Journal of Applied Mathematics, Vol. 8, pp.

181-217, 1960.


Rosen, J. B., "The Gradient Projection Method for Nonlinear Programming Part

II, Nonlinear Constraints, " SIAM Journal of Applied Mathematics, Vol. 9,

pp. 514-532, 1961.


Taguchi, G., "Introduction to Quality Engineering," Asian Productivity Center

1989.


Vanderbilt, D. and Louie, S. G., "A Monte Carlo Simulated Annealing Approach

to Optimization over Continuous Variables," Journal of Computational Physics,

Vol. 56, pp. 259-271, 1984.


Wolfe, P., "Methods of Nonlinear Programming," on Recent Advances in

Mathematical Programming, Graves, R. L., and Wolfe, P. (Eds.), McGaw-Hill

New York, pp. 67-86, 1963.