ࡱ> vxuv[@ Nbjbj44 ViVi\$444Dx^^^ _Ll_xD` b b b b e}4/$Rb4{d| e{{44 b bqԡϛϛϛ{4 b4 bϛ{ϛϛg44 b8` Z]8d^}0xx44444XˈV!ϛًˈˈˈxxD+3*(xx3Parallel Implementations of Direct Solvers For Sparse Systems of Linear Equations on PVM System and nCUBE Machine Xuyang Li Civil Engineering Department / 4190 Bell Engineering Center Azhar Maqsood, James M. Conrad Computer Systems Engineering / 313 Engineering Hall University of Arkansas, Fayetteville, AR 72701 E-mail: {xl0, am02, jmc3}@engr.engr.uark.edu Abstract - The basic problems in developing parallel direct solvers of sparse systems of linear equations are discussed in this report. These problems, including the storage schemes of sparse matrices, the running environment of the programs, and the parallelization of the sequential algorithms, are handled while keeping current parallel computer architectures in mind. The behavior of the parallel machines used for the underlying problem is also discussed in this report. The problem is applied over two different parallel environments: the Parallel Virtual Machine (PVM) and the nCUBE machine (Hypercube). Test results for both versions are analyzed in terms of the machine structure and algorithm design. 1. Introduction Solving large systems of equations in which a majority of the coefficients are zero is very important in scientific research and engineering computing. Systems of equations like these, called sparse systems of equations, are often encountered in numerical deductions of problems for which analytical solutions are very hard to obtain. Examples of such problems include solving partial differential equations by numerical methods, multi-dimensional spline interpolations and finite element method calculations in weather forecasting, computer aided design and computer assisted manufacturing, fluid dynamics calculations, and simulation of natural behaviors. Some problems result in systems of linear equations with coefficient matrices of special structures, others result in systems of linear equations with coefficient matrices of random structures. It is often inefficient and sometimes impossible to solve sparse systems of equations by using dense matrix system solvers because the memory occupied by the zero elements of the matrix is too large to handle. Solving these systems of equations involve more complex algorithms and data structures than their dense counterparts. A system of n linear equations has the following form:  EMBED Equation.2  In matrix notation, this system can be represented by  EMBED Equation.2 , where A is the  EMBED Equation.2 matrix of coefficients such that  EMBED Equation.2 , b is an  EMBED Equation.2 vector  EMBED Equation.2  and x is the desired  EMBED Equation.2  solution vector  EMBED Equation.2 . The matrix A is considered sparse if a computation involving it can utilize the number and location of its nonzero elements to reduce the run time over the same computation on a dense matrix of the same size. Although there are many good algorithms and programs on sequential computers that can be used to solve sparse linear systems, they have limited success in solving sparse systems of linear equations on parallel computers [1, p.454]. The reasons for this are twofold. The iterative methods for sparse linear systems are fast if they converge. The problem is they are sometimes not convergent. The direct methods, on the other hand, are very stable, but they involve a large amount of communication among processors on distributed-memory parallel computers. In this report, different aspects of implementing parallel algorithms of sparse linear systems are discussed. The storage scheme, algorithm, and implementation details of a direct method are given. Finally, the results comparing with its dense counterpart and its sequential implementation are also given. 2. Direct Methods Versus Indirect Methods There are two totally different kinds of methods for solving systems of linear equations. They are direct methods and indirect methods. The indirect methods or iterative methods are techniques to solve systems of equations of the form  EMBED Equation.2  that generate a sequence of approximations to the solution vector x. In each iteration, the coefficient matrix A is used to perform a matrix-vector multiplication. The number of iterations required to solve a system of equations with a desired precision is usually data dependent, hence, the number of iterations is not known prior to executing the algorithm. Iterative methods do not guarantee a solution for all systems of equations, even though the systems are not singular. This means the iterative methods may be divergent when applied to certain data. This was the main reason that direct methods were chosen to be used in the implementation presented in this report. There is another reason why the direct methods were used in the implementation here: the iterative methods are sometimes for special purposes. This means that one iterative method may be only suitable for solving a special kind of systems of equations, such as those resulting from finite element method calculations. For example, the conjugate gradient method and the preconditioned conjugate gradient algorithm are only suitable for solving large sparse systems of linear equations with symmetric positive definite matrices [1, pp.433 - 436]. Direct methods are useful for solving sparse linear systems because they are general and robust. Although there is substantial parallelism inherent in sparse direct methods, only limited success has been achieved to date in developing efficient general-purpose parallel formulations for them. Developing efficient general-purpose parallel formations of direct methods for unstructured or random sparse matrices is currently an active area of research. Although all of these methods are based on Gaussian elimination (for general matrices) and Cholesky factorization (for symmetric positive definite matrices), their parallel formulations can be quite complicated. Here, the Gaussian elimination with partial pivoting method is implemented on parallel computers. The implementations can be used for solving general sparse linear systems. 3. Parallel Computers Used For Implementation Parallel computers have different structures and different software environments. The Parallel Virtual Machine (PVM) was chosen as the first kind of parallel computing environment for the implementation. The main reason for choosing the PVM is that it is a system with multi-architecture compatibility. The PVM is not a specific machine. Rather, it is a software environment. PVM permits a network of heterogeneous UNIX computers to be used as a single large parallel computer. Thus large computational problems can be solved by using the aggregate power of many computers. These machines are often the most popular computers now in use, such as the SUN SPARCstations, the CRAY supercomputers, the 80386/80486 UNIX box, the Thinking Machines, the DEC Alpha, and the micro VAX. PVM supplies the functions to automatically start up tasks on the virtual machine and allows the tasks to communicate and synchronize with each other. Applications, written in C or FORTRAN, can be parallelized by using message-passing constructs common to most distributed-memory computers. By sending and receiving messages, multiple tasks of an application can cooperate to solve a problem in parallel. PVM supports heterogeneity at the application, machine, and network level. So PVM allows application tasks to exploit the architecture best suited to their solution. All the data conversion that may be required if two computers use different integer or floating point representations are handled by PVM. Even machines that are interconnected by a variety of different networks can be used by PVM. Programs running on PVM do not need to know the details of communication. The library functions used on PVM system for different architectures have the same syntax. Because of these, programs written for PVM system can be easily ported from one architecture to another without modifications. They can be run simultaneously on different machines. This flexibility makes PVM one of the most powerful parallel computing environments. However, the PVM system is not perfect. Because the PVM system mainly uses networks to transmit data from machine to machine, the performance of the system is largely dependent on the performance of the networks. This is really a consequence of its flexibility. This architecture, as the results shown later, limits the performance of the parallel implementation of Gaussian elimination method [2, p.1 - 5]. Another kind of parallel computer used for the implementation of algorithms was the nCUBE (a hypercube machine). The nCUBE is a high-performance parallel computer. The big advantage of the nCUBE over PVM is that the communication among processors on the nCUBE is highly efficient. nCUBE machines are also distributed-memory machines. 4. Storage Scheme For Sparse Matrices It is customary to store an  EMBED Equation.2  dense matrix in an  EMBED Equation.2  array. However, if the matrix is sparse, storage is wasted because a majority of the elements of the matrix are zero and need not be stored explicitly. If the positions and values of all the nonzero elements of the matrix are known, then the whole matrix is known. It is a common practice to store only the nonzero elements and to keep track of their locations in the matrix. Currently there are many storage schemes that can be used to store and manipulate sparse matrices [1, pp.409 - 412]. These specialized schemes not only save storage but also yield computational savings. Since the locations of the nonzero elements in the matrix are known explicitly, unnecessary multiplications and additions with zero can be avoided. Each of these schemes is developed for specific purposes. They are suitable for different implementations on machines with different architectures. Some data structures are more suitable for a parallel implementation than others. There is no single best data structure for storing sparse matrices. In the implementation presented here, a special storage scheme is used. This storage scheme is characterized by the employment of a group of single direction linked lists and a row pointer vector. In this scheme, each row of the matrix is represented by a single direction linked list. One element of the matrix in a row is represented by a node in the linked list. The node is represented by the following data structure using C notation: struct row { float elem; int col; struct row * next; } The field elem of struct row is the value of a nonzero matrix entry in the current row, while the field col of struct row is the column number of this entry. The field next of struct row is a pointer that points to the node representing the next nonzero element in the same row. The field next of the last node of this row is set to NULL. During initialization and computation, the order of the nodes in the linked list is kept so that the col is always in ascending order along the direction of the linked list. This order keeps the searching of an element with a certain col number fast. All the rows in the sparse matrix are bound together by a vector of structure pointers. Each element of this vector is a pointer that points to the first element of the linked list that represents a row in the matrix with its row number equal to the index of this element in the vector. So the type of this vector is as follows: struct row ** in which struct row is defined above. For the example matrix shown in Figure 1, the schematic representation of it is shown in Figure 2. In the implementation presented later, the memory occupied by the vector of pointers is allocated at the time the number of equations n (that is, the number of rows in the coefficient matrix) is known. The memory is allocated so that the number of elements in the vector is exactly n. Memory occupied by the vector will not change after that. Note that each row of the sparse matrix of a non-singular system of linear equations must have at least one nonzero element (otherwise, it would be singular), so there should be no element of the above vector that has the value NULL during the whole computational process. This means that the above storage scheme wastes no memory in the vector for non-singular systems of equations. Also, in the linked list representation of rows, sequential search for an element is needed due to the sequential property of the linked list, and because the storage order of the elements is maintained as described above, the time needed for accessing an element in a row is O(s/2), where s is the number of nonzero elements in the row. Memory needed for storing a sparse matrix using the above scheme is calculated as: N*sizeof (struct row) + n*sizeof (struct row *), where N is the number of nonzero elements in the matrix, and n is the number of rows in the matrix. This number is always changing during processing, because dynamic memory allocation is used to keep memory usage the most economical. In the implementation, an element in the matrix is considered to be zero if the absolute value of it is less than or equal to a preset small positive number (user defined). So in the process of elimination, newly produced nonzero elements are added to the matrix, while all the newly produced zero elements are removed from the matrix.  EMBED Equation.2  Figure  SEQ Figure \* ARABIC 1 - A Sparse Matrix  EMBED Word.Picture.6  Figure  SEQ Figure \* ARABIC 2 - Sparse Matrix Storage Scheme 5. Basic Algorithms A system of equations Ax=b is usually solved in two stages. First, through a series of algebraic manipulations, the original system of equations is reduced to an upper-triangular system of the form  EMBED Equation.2  This can be written as Ux=y, where U is a matrix in which all subdiagonal entries are zero. That is U[i,j] = 0 if i > j, otherwise U[i, j] = EMBED Equation.2 . U is called an upper-triangular matrix. This stage is called factorization. In the second stage of solving a system of linear equations, the upper-triangular system is solved for the variables in reverse order from  EMBED Equation.2  to  EMBED Equation.2  by a procedure known as back-substitution. The basic algorithm used in this implementation is Gaussian elimination with partial pivoting. The sequential version of it has several nested loops. Figure 3 shows the Gaussian elimination with partial pivoting algorithm used as the basis of parallelization. Procedure GAUSSIAN_ELIMINATION_W_PARTIAL_PIVOTING(A, b, n) var marked : array [0 .. n - 1] of boolean; pivot : array [0 .. n - 1] of 0 .. n - 1; i, j, k, picked : integer; tmp, tmp1 : real; begin for i: = 0 to n - 1 do begin { pivoting operations } tmp1 := 0; for j := 0 to n - 1 do begin if ((not marked[j]) and (ABS(A[j, i]) > tmp1)) then begin tmp1 := ABS(A[j, i]); picked := j; endif; endfor; tmp1 := A[picked, i]; marked[picked] := true; pivot[picked] := i; { elimination operations } for j := 0 to n - 1 do begin if (not marked[j]) then begin tmp := A[j, i] / tmp1; b[j] := b[j] - b[picked] * tmp; for k := i + 1 to n - 1 do begin A[j, k] := A[j, k] - A[picked, k] * tmp; endfor; endif; endfor; endfor; for i := 0 to n - 1 do begin if (not marked[i]) then begin pivot[i] = n - 1; endif; endfor; end GAUSSIAN_ELIMINATION_W_PARTIAL_PIVOTING Figure 3 - Sequential Algorithm of Gaussian Elimination with Partial Pivoting After the full matrix A has been reduced to an upper-triangular matrix U, the back-substitution operation is conducted to determine the vector x. The sequential back-substitution algorithm for solving the upper-triangular system of equations Ux = y is shown in Figure 4. procedure BACK_SUBSTITUTION (U, y, pivot, n) var i, j, row : integer; begin for i := n - 1 downto 1 do begin for row := 0 to n - 1 do begin if (pivot[row] = i) then begin exit for; endif; endfor; { solution for i'th variable } y[row] := y[row] / U[row, i]; { back-substitute } for j := 0 to n - 1 do begin if (pivot[j] < i) then begin y[j] := y[j] - y[row] * U[j, i]; endif; endfor; endfor; for row := 0 to n - 1 do begin if (pivot[row] = 0) then begin exit for; endif; endfor; y[row] := y[row] / U[row, 0]; end BACK_SUBSTITUTION. Figure 4 - Back-substitution Algorithm The Gaussian elimination and back-substitution algorithms were originally designed for solving dense matrix systems of equations. In order to save execution time, unnecessary memory movements are avoided by using the array pivot and marked in the algorithm. Rather than assigning zero to the eliminated elements in matrix A, the algorithm simply leaves them unchanged because they are not used in the following steps. All the unnecessary assignments to elements of matrix A are avoided in this way. For sparse matrix systems of equations, the algorithm should be modified so that the storage occupied by newly produced zero elements of matrix A be released to save memory. Sparse systems of equations often have very large sparse matrices, so the above modification is necessary. 6. Parallelization of The Algorithms 6.1 General Criteria And Data Partitioning 6.1.1 Ordering of The System of Equations The characteristics of the machines used in the implementation should be considered when parallelizing the above algorithms. Four steps are considered in the parallization of the above algorithms. They are ordering, symbolic factorization, numerical factorization, and solving a triangular system. Some parts of these steps may be omitted when considering the implementation of the algorithms on specific machines. Ordering is an important phase of solving a sparse linear system because it determines the overall efficiency of the remaining steps. The aim of it is to rearrange the rows of the original coefficient matrix so that the permuted matrix leads to a faster and more stable solution. The numerical stability of the solution is increased by ensuring that the diagonal elements or pivots are large compared to the remaining elements of their respective rows. This is already included in the sequential algorithm and needs to be parallelized. The ordering criteria for obtaining a faster parallel solution are very complex. This process needs large amount of changing positions of rows in the matrix, so for distributed-memory machines such as PVM, this could use a large proportion of the whole computation time. The benefits resulting from ordering will be fully surpassed by the waste of time in communication, because data transmission in PVM system is crucial to the overall performance of the implementation. Considering this factor, the implementations presented here only emphasize the enhancement of the numerical stability of the solution and contain a partial pivoting process. A large amount of communication is avoided by eliminating a full ordering process. 6.1.2 Data Partitioning And Factorization of The System of Equations Due to the availability of very fast serial algorithms and the high data-distribution cost involved in parallelizing them, implementations of parallel symbolic factorization on message-passing computer (distributed-memory machines) tend to be inefficient. Moreover, symbolic factorization is often performed once and then several systems with the same sparsity pattern are solved, amortizing the cost of symbolic factorization over all the systems [1, p.458]. On the other hand, the programs presented here are used to solve systems of equations that usually have no relation with each other. So the benefits of using symbolic factorization in these programs are not obvious. Because of this, the symbolic factorization to the system was not conducted in the programs. In order to conduct numerical factorization in parallel, the coefficient matrix needs to be mapped onto all the processors. This involves partitioning the matrix into small blocks so that each block can be assigned to a specific processor. It is important to choose an appropriate data-mapping scheme for distributed-memory machines. For the PVM and nCUBE machines, it is best to use the block-striped partitioning of the matrix because this can reduce the communication costs. In this partitioning scheme, the matrix is divided into groups of complete rows, and each processor is assigned one such group. Each group contains contiguous rows. For example, a matrix with 16 rows can be divided into four groups and each group is assigned to one of four processors. This is shown in Figure 5.  0....................................................................... P0 1....................................................................... 2.......................................................................  3....................................................................... 4....................................................................... P1 5....................................................................... 6.......................................................................  7....................................................................... 8....................................................................... P2 9........................................................................ 10......................................................................  11...................................................................... 12...................................................................... P3 13..................................................................... 14...................................................................... 15...................................................................... Figure 5 - Block Striping Partitioning of a 16 x 16 Matrix on 4 processors The partition is made so that the difference between the number of rows contained in any two groups is at most one. This distributes the data evenly to all the processors. Apart from the coefficient matrix, the right-hand side vector of the system of equations is also partitioned by the same partitioning scheme. Each processor has its own array pivot and array marked. In addition to the processors that contain the data, a separate processor is used as the master node that controls the whole computational process. 6.2 Parallelizing The Algorithms 6.2.1 Parallelizing The Pivoting Process The pivoting process is somewhat complicated when trying to run in parallel. First, each processor or node searches for the row that has the maximum absolute element value in the current column. This element is called the local pivot. The local pivot together with the ID of the node that contains it is then sent to the master node. The master node collects all the local pivots and the corresponding node IDs and compares the local pivots. The pivot with the largest absolute value (which is the TRUE pivot) is then found, and the ID of the node that has this pivot is broadcast to all the slave nodes. Each node then compares the received ID with its own, and the one that has the pivot row broadcasts the entire pivot row to other slave nodes. In both the PVM implementation and the nCUBE implementation, each slave node searches their local pivot in parallel. In the PVM implementation, the master node collects the local pivots sequentially. On the nCUBE, however, the special structure of the machine is considered so that the pivoting is conducted in a faster way. This will be discussed later in this report. The sequential part limits the speedup of the PVM implementation. 6.2.2 Parallelizing The Elimination Process The parallel elimination process is relatively simple compared to the pivoting process. It is also more efficient. All the nodes eliminate their own part of data simultaneously using the pivot row received. Because there is no swapping of rows in the pivoting and elimination process, the order of the rows has been random from the beginning. The pivoting and elimination loads are evenly distributed to all processors. Combining the above two parts, the parallel pivoting and elimination processes are shown in Figure 6.  EMBED Word.Picture.6  Figure 6 - Parallel Pivoting and Elimination Processes 6.2.3 Parallelizing The Back-substitution Process Back-substitution is done in parallel, too. The back-substitution process begins from the last variable. The order is controlled by the master node. Because the order of the rows in the matrix is random, each processor needs to search for the current variable simultaneously. The node that has found the current variable broadcasts it to all the other nodes. The next step is to back-substitute the variable simultaneously by all the processors. This process repeats until all the variables are found. The parallel back-substitution process is shown in Figure 7.  EMBED Word.Picture.6  Figure 7 - Parallel Back-substitution Process 6.2.4 Parallelizing The Input And Output Processes It is natural to consider parallelizing the input and output processes because they are usually time consuming. However, from an analysis of the time used in each part, it was found that the input and output processes only used a very small fraction of time of the whole process. Considering this and the sequential characteristics of the input and output files, the input process was only parallelized to a minor degree, and the output process was done sequentially. 7. Implementation of Algorithms on PVM and nCUBE 7.1 Implementation on The PVM System The PVM version of the direct solver of systems of linear equations has two separate modules, called the master module and the slave module. The master module is run on the master processor and the slave module is run on the slave processors. The master module is responsible for reading command line parameters that include the number of processors, the input file name, and the output file name. It is also responsible for launching the slave module on each slave processor. This is done by calling the PVM routine pvm_spawn(). The master module distributes the work load evenly to the slave processors. The most important work the master module does is to help finding the pivot rows in the elimination process. Functions pvm_initsend(), pvm_pkint(), pvm_pkfloat(), pvm_mcast(), and pvm_send()are used to send messages to other processors. Functions pvm_recv(), pvm_upkint(), and pvm_upkfloat()are used to receive messages from other processors. The master module is also responsible for collecting the final results and writing them to the output file. The slave module first reads the corresponding part of data from the input file according to their processor ID and then conducts the elimination and back-substitution. Finally, the solutions are sent to the master processor at the request of the master processor. 7.2 Implementation on The nCUBE System Implementation on the nCUBE system is quite similar to the implementation on the PVM system. However, there is no separate master processor in this implementation. The first node is used as the master processor. In the pivoting process, the master processor collects the local pivot by using a binary message passing scheme. This scheme greatly reduces the steps needed for pivoting, especially when large numbers of processors are used. In each step of this scheme, message passing is conducted simultaneously between several pairs of processors which are direct neighbors to each other. That is, the Hamming distance between the two processors is one. Due to the characteristics of the hypercube architecture, passing messages from one node to its direct neighbor is faster than passing messages between nodes that are not direct neighbors. This scheme is best described by an example. Suppose eight processors are used in the computation. They are numbered 0 through 7. Processor 0 is the master processor. The message passing process contains three steps. They are shown in Figure 8. Step 1: 7->6, 5->4, 3->2, 1->0 Step 2: 6---->4, 2---->0 Step 3: 4---------->0 Figure 8 - A Binary Message Passing Scheme on nCUBE 7.3 Program Interface 7.3.1 Command Line Parameters The PVM version of the program is launched in the following way (suppose the master program has the name gpmaster): gpmaster <# of processors> The nCUBE version of the program is launched by the xnc utility as follows (suppose the name of the program is gs): xnc -d gs 7.3.2 Input And Output File Format The formats of the input and output file for both the PVM version and the nCUBE version are the same. The input file format is as follows. The first number in the file is the number of equations. Following that are the nonzero elements of the coefficient matrix and the right-hand side of the equations. Each nonzero element of the matrix is represented by a triple of numbers: the row number of this element, the column number of this element, and the element itself. The right-hand side of each equation is represented by the following triple of numbers: the row number, -1, and the value itself. These triples can be in any order in the input file. For example, a system with coefficient matrix A shown in Figure 1 and right-hand side  EMBED Equation.2  can be represented by the following file: 6 0 0 1.0 0 3 2.0 0 5 3.0 0 -1 3.0 1 0 4.0 1 1 5.0 1 -1 -2.0 2 1 6.0 2 5 8.0 2 -1 4.0 3 0 9.0 3 4 10.0 3 5 11.0 3 -1 1.0 4 1 2.0 4 4 14.0 4 -1 2.5 5 2 3.0 5 -1 1.4 The output file format is quite simple. It is shown below (the ellipses and n are replaced by proper numbers in real files): x[0] = ... x[1] = ... ... x[n-1] = ... 8. Results The programs were run to solve systems of equations resulted from practical problems. The systems of equations generated in the process of solving partial differential equations by using bivariate cubic spline functions [3, p.9 and 5, p.213] were used in the testing. These systems of equations are typical sparse systems. In addition to these equations, other systems of equations generated by a special program were also tested. The PVM version was tested on SUN SPARCStations. For a system of 100 equations, the result is shown in Table 1. The number of processors and the corresponding execution time is listed in the table. In this case no speedup was achieved in the testing. A system of 600 equations was also used for testing. The result is shown in Table 2. There is little speedup in this case. In both cases, the execution time increases rapidly as the number of processors increases. Table  SEQ Table \* ARABIC 1 - Result of PVM Version Running For a System of 100 Equations Number of ProcessorsExecution Time (Seconds)16293114135176167168209251028The nCUBE version was tested on the SUNCUBE machine at Texas A&M University. In this case, the corresponding program for dense matrix systems was also tested. The results of both sparse solver and dense solver are shown in Table 3. Table  SEQ Table \* ARABIC 2 - Result of the PVM Version Running For a System of 600 Equations Number of ProcessorsExecution Time (Seconds)171254363474596 Table  SEQ Table \* ARABIC 3 - Result of the nCUBE Version Running For a System of 100 Equations Dimension of SubcubeExecution Time (Sparse) (nano-seconds)Execution Time (Dense) (nano-seconds)01233520372119011415780321843021562920326999032090550372290046680630883878051004810017154800 9. Discussion of the Results The results showed that for small systems of equations, it is hard to get any speedup when running these programs. This is because the communication cost for small systems is very high. That is why there is no speedup in most of the cases tested. From Table 3 it is seen that the sparse system solver is more efficient than the dense solver for sparse systems of equations. This is an expected result. From the results shown in Table 1 and Table 2, it is obvious that the performance of the program is improved for larger sparse systems. It can be inferred that the programs will show apparent speedup for larger systems, such as systems of thousands of equations. Due to the limit of disk quota in the machines used for testing, larger systems of equations could not be tested because the input file would be very large to be saved on the disk. 10. Conclusion Direct solvers for sparse systems of equations are very useful in scientific research and engineering. The results of the research showed that the algorithms for solving sparse systems of equations on distributed-memory machines are sometimes inefficient because of the high communication cost involved. To improve the performance of the algorithms and the programs, more efficient ordering and elimination algorithm suitable for distributed-memory machines needs to be developed. Other kinds of machines, such as multi-processors, can also be considered in the implementation. Of course the characteristics of this kind of machines should be examined when making new algorithms. References Kumar, Vipin, Ananth Grama, Anshul Gupta, and George Karypis, Introduction to Parallel Computing: Design and Analysis of Algorithms, Benjamin/Cummings: Redwood City, CA, 1994. Geist, Al, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Robert Manchek, Vaidy Sunderam, PVM 3 Users Guide and Reference Manual, Oak Ridge National Laboratory, Oak Ridge, Tennessee, 1993 nCUBE Corp., nCUBE 2 Programmers Guide, 1994 Texas A&M University Super-computer Center, Parallel Computing Orientation Guide, College Station, Texas, 1993 Xiong, Z. X. and X. Y. Li, The Application of Bivariate Cubic Spline, Approximation, Optimization and Computing: Theory and Applications, Elsevier Science Publishers B.V., Amsterdam, The Netherlands, 1990 Matarese, Joe, nCUBE Manual Page Form, Earth Resources Lab, MIT, Cambridge, MA, 1994  Proceedings of the 1996 Arkansas Computer Conference, Sercy, AR, pp. 52-63, March 1996. This version has been reformatted.  Currrently at UNC Charlotte, 9201 University City Blvd, Charlotte, NC 28223, jmconrad@uncc.edu PAGE 9 qr}# % k s  !4567no᳧ᖊxkjJ3 hueh+|UjJ3 h+|UV hueh+|jK3 h+|EHU jK3 h+|UVmHnHujL3 h+|EHU jL3 h+|UVmHnHujh+|U h+|6h 7&jhue0J#5Uh+| h+|5jh 7&0J#5CJU h+|5CJ#>s} < i j k 6 F  8n %b{gduegduegdued$da$ $$da$\=M !"#$)*:;NOPQbcvwxyŸקאrf[jC3 h+|UVjD3 h+|EHU jD3 h+|UVmHnHujE3 hueh+|UjE3 h+|UVjF3 h+|EHU jF3 h+|UVmHnHujG3 hueh+|UjG3 h+|UV hueh+|h+|jh+|UjH3 h+|EHU jH3 h+|UVmHnHu#{NH$ )*"+S0H1U1c1n11123!5/55 $da$gdFgddgd $xxgd 7&gdxxgdZigdue#$QR>+?+R+S+T+U+i+j+}+~+++H1111111111.22262@22222?3B333!5/585B5?6@6667799j:k:粦 hFhF h+|5 h+|6h+|OJQJjA3 h+|EHU jA3 h+|UVmHnHujB3 h+|EHU jB3 h+|UVmHnHu hh+|h+|jh+|UjC3 hh+|U657::<<<=Q=e=,>D>)AeAfAjAAAAAAABgd! PP$da$PPxxgdZixxgdF $0$a$gdF$a$gdFgdFgdk:q:r:::<<<<<<<<<<<<<< = =====.=/=0=1=Q={==,>->@>A>B>C>[>_>g>h>>>>>>>xj=3 h+|EHU j=3 h+|UVmHnHujh+|U h+|6h+|j>3 hFU j>3 hFUVmHnHujhFUh8YmHnHuj?3 hFEHU j?3 hFUVmHnHujhFU hF6hF/>>>>>>>>>>>>>>>>>????????????AAA)A4ABBBEEEEqqid h! 6h! h! 5 h5 *h! CJOJQJ^JaJh! CJOJQJ^JaJhZih! j:3 h+|EHU j:3 h+|UVmHnHuj;3 h+|EHU j;3 h+|UVmHnHuj<3 h+|EHU j<3 h+|UVmHnHujh+|U h+|6h+|'BBB6BEB`BjBBBBBBCC;CSCTCsCCCCCC D.DJCJIJJJ5K6KKKjLYYYZZJ\K\]]P_W_X__K``a aaahhhĶĒĀjh+|Uh+|KHOJQJ h+|KHh jh CCJUmHnHu h CCJh Cjh CUmHnHuh+|h! OJQJh! h! 5h CCJOJQJ^JaJ h5 *h! CJOJQJ^JaJ h! 6h! ,GGGGGGG HH4H@HiHvHHHHHHHHHHI2IYIkLLxxgdZiPPgd C$a$gd! gd! LLLSSVYYCZZZG[[[J\\\N]]]Q^^^P__0$gd C0$gd C d0$gd Cgd Cgd C__K`aaadffvhhh/i0ibikkkklmgd9$a$$da$ `gd Cgd Cgd Cgd ChhhhhikkkkkkkkknnOpZp"q0q2q=q?qLqNqYq_qjqqqqqqqwOxVxWxxx-ylymyyyy)z }}5}6}ٶ٨٠٠٠٠٠٠٠٠٠ٖٖٖ٠ًٖكjh+|U h+|6 h+|CJh+|CJOJQJh+|OJQJhZij4 h+|U j4 h+|UVmHnHuj<h+|U h+|KHh h+|jh+|Uj4 h+|U j4 h+|UVmHnHu3m!nFnrr|sswx/xOxxxx-ymyy)zLzw}y}}}}}}gduedgd9xxgdZi6}I}J}K}L}w}~~g~h~~~U[\qrst #$%&iӄԄՄք~o:cōĎ,[\ hue6hueOJQJh8YmHnHujhueUhue h+|6 h+|CJh+|CJOJQJjh+|Uj63 h+|EHU j63 h+|UVmHnHuh+|8}}}}}}}}}}}~ ~~~~~~~~~}UȂ$d$0$Ifa$gdue0$gduegduexxgdZiȂ\Hkd$$Ifl0f t4 laHkd$$Ifl0 f  t4 la$d$0$Ifa$gdue\Hkd$$Ifl0f t4 la$d$0$Ifa$gdueHkd$$Ifl0f t4 la\Hkd$$Ifl0f t4 la$d$0$Ifa$gdueHkd $$Ifl0f t4 la \Hkd$$Ifl0f t4 la$d$0$Ifa$gdueHkd"$$Ifl0f t4 la\Hkd$$Ifl0f t4 la$d$0$Ifa$gdueHkd$$$Ifl0f t4 lai~MHkd$$Ifl0 f  t4 la$d$0$Ifa$gdue0$gduegdueHkd&$$Ifl0 f  t4 la\Hkd$$Ifl0f t4 la$d$0$Ifa$gdueHkdR$$Ifl0f t4 la\Hkd$$Ifl0f t4 la$d$0$Ifa$gdueHkdT$$Ifl0f t4 la0HWn}$d$Ifa$gduegduegdueHkdV$$Ifl0 f  t4 la}~0_kd $$IfTlF+  t    4 laT$d$Ifa$gdue_kd$$IfTlF+    t    4 laT_kd7 $$IfTlF+  t    4 laT$d$Ifa$gdueɅʅ0_kdm $$IfTlF+  t    4 laT$d$Ifa$gdue_kd $$IfTlF+  t    4 laTʅ̅ԅ܅݅߅_kd $$IfTlF+  t    4 laT$d$Ifa$gduehx%1ˍ:y & Fdx & FdxxxgdZigdue_kd $$IfTlF+    t    4 laT\ڏ<=>?JKLMN$a$&`#$"gd 7& & Fdx \]ُڏۏ܏;?@FGHIJMN͸͢ h+|6h8Y0JmHnHu h8Y0Jjh8Y0JUhueh8YCJaJh8Yjh8Y0J#Uh8YCJaJh 7&h8YCJaJ!jh 7&h8Y0J#CJUaJ4 0000 P/ =!"#$%88< 00P:pF/ =!"#$%88 P @10P:pue/ =!"#$%888 00P:pue/ =!"#$%88 P$$If!vh55f #v#vf :V l t55f /  /  / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / /  4$$If!vh55f #v#vf :V l t55f /  /  / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / 4$$If!vh55f #v#vf :V l t55f / / /  4$$If!vh5+5 5#v+#v #v:V l t5+5 5/  / /  / 4T$$If!vh5+5 5#v+#v #v:V l t5+5 5/ / 4T$$If!vh5+5 5#v+#v #v:V l t5+5 5/ / 4T$$If!vh5+5 5#v+#v #v:V l t5+5 5/ / 4T$$If!vh5+5 5#v+#v #v:V l t5+5 5/ / 4T$$If!vh5+5 5#v+#v #v:V l t5+5 5/ / 4T$$If!vh5+5 5#v+#v #v:V l t5+5 5/ / /  4T  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklnopqrstwz{|}~Root Entry Fpm8dyxData mWordDocumentObjectPool5z28dpm8d_864128588Fz28d48dOle PIC LMETA    !"#$%&'()*+,-/23456789:;<?ABCDEFGINPQRSTUVX]_`abcdefghijlortuvwxyz{}L    A .1   & & MathTypeTimes New Roman w - 2 _a 2 _x 2 _av 2 _9xv 2 _ av 2 _. x 2 _Hbv 2 ]a 2 ]hx 2 ]ay 2 ] x 2 ] a 2 ] xv 2 ]bv 2 >av 2 >Px 2 >uav 2 >x 2 >P a 2 >xy 2 >b@Times New Roman w - 2  nv` 2  nv` 2  n` 2 p nv` 2 rn` 2 n` 2  ny` 2 X n` 2 (n` 2 Anv`@Times New Roman w - 2 r0` 2  0v` 2  0` 2 C0` 2 1y` 2 1` 2 N 0` 2  1v` 2  1` 2 0` 2 f1v` 2 0` 2 0` 2 1y` 2 1` 2 1` 2  1v` 2 | 1` 2 L 1` 2 }1v` 2 N1` 2 0` 2 0y` 2 1` 2 o1` 2 W1v` 2  1` 2 41` 2 1v` 2 1` 2 ,0 2 ,y0 2  ,0 2 ,0 2 w,v0 2 j ,0 2 ,0 2 G,v0 2 " ,0Times New Roman w - 2 _I,yP 2 ],P 2 >.PSymbolww w - 2 _+ 2 _*+ 2 _+v 2 _Q= 2 ]+ 2 ]+y 2 ]+ 2 ] = 2 >+ 2 >+ 2 > + 2 >=v 2 v< 2 < 2 <y 2 < 2 < 2 < 2 <@Symbolww w - 2 X -vi 2 ( -i 2  -i 2  -yi 2 -i 2 -i 2 ^ -i 2  -i 2 -i 2 -viMT Extraw w - 2 _L@ 2 ]Ly@ 2 ^Mk 2 ^Mk 2 ^K Mk 2 ^Mk 2 >i L@ & "System  %P0w-ƉFP4wF*PF FMicrosoft Equation 2.0 DS Equation Equation.29qCompObj .fObjInfo 0Equation Native 1_864128587 F48d578dPOO a 0,0 x 0 +a 0,1 x 1 +L+a 0,n-1 x n-1 =b 0 ,a 1,0 x 0 +a 1,1 x 1 +L+a 1,n-1 x n-1 =b 1 ,MMMMa n-1,0 x 0 +a n-1,1 x 1 +L+a n-1,n-1 x n-1 =b n-1 .{TiLElOle =PIC  >LMETA @CompObjHfE  .1  `&0 & MathType Times New Roman w - 2 ^Ax 2 bSymbolww w - 2  =2 & "System  %P0w-0w- FMicrosoft Equation 2.0 DS Equation Equation.29q ## Ax=b 0,L+`+`  .ObjInfoJEquation Native K<_864128586F578d~98dOle LPIC MLMETA OCompObjWfObjInfoY1  @&0 & MathType Times New Roman w - 2 6n 2 nSymbolww w - 2 2 & "System  %PHw-Hw- FMicrosoft Equation 2.0 DS Equation Equation.29q 0W#W# nn L4@4  .1  `&  &Equation Native Z<_864128584' F~98d~98dOle [PIC \LMETA ^(CompObjkfObjInfomEquation Native n\ MathType`Times New Roman w - 2 @^A2 2 @piY 2 @jY 2 @a@Times New Roman w - 2 8i25 2 j5Times New Roman w - 2 @ [g 2 @,yP 2 @]2i@Times New Roman w - 2 ~,0Symbolww w - 2 @= & "System  %PJw-"System  FMicrosoft Equation 2.0 DS Equation Equation.29q@W#  A[i,j]=a i,j1_864128583"F~98d;8dOle pPIC !$qLMETA sL~  .1  `&0 & MathType Times New Roman w - 2 6nySymbolww w - 2 Times New Roman w - 2 1 & "System  %Ppaw-paw- FMicrosoft Equation 2.0 DS Equation Equation.29q   n1w Roman( CompObj#%|fObjInfo&~Equation Native <_864128582. )F;8d`>8dOle PIC (+LMETA CompObj*,fL4@4"  .1  & & MathTypePTimes New Romanw - 2 `"[l 2 `,yP 2 `,P 2 `,yP 2 `]lTimes New Romanw - 2 `b 2 `b 2 `b@Times New Romanw - 2 tn` 2 Tk@Times New Romanw - 2 0` 2 j1` 2 L1`MT Extraww - 2 `/K@@Symbolwww - 2 -i & "System %Pȭw-&  FMicrosoft Equation 2.0 DS Equation Equation.29q`OlO [b 0 ,b 1 ,K,b n-1 ] TObjInfo-Equation Native |_8641285810F`>8d`>8dOle LV$  .1  `&0 & MathType Times New Roman(w - 2 6nySymbolww(w - 2 PIC /2LMETA CompObj13fObjInfo4Times New Roman(w - 2 1 & "System (%Px"w- FMicrosoft Equation 2.0 DS Equation Equation.29q ## n1 Equation Native <_864128580Q7F`>8d@8dOle PIC 69LL 4(@ 4"  .1  @& & MathTypePTimes New Romanw - 2 `"[yl 2 `,yP 2 `,yP 2 `,yP 2 `]ylMETA CompObj8:fObjInfo;Equation Native |Times New Romanw - 2 `x 2 `,x 2 `Ix@Times New Romanw - 2 n ` 2 sTk@Times New Romanw - 2 .0` 2 1` 2 1`MT Extraww - 2 `mK@@Symbolwww - 2 ?-i & "System %Pȭw-&  FMicrosoft Equation 2.0 DS Equation Equation.29q`O [x 0 ,x 1 ,K,x n-1 ] T_864128579>F@8dC8dOle PIC =@LMETA LElE.$  .1  `&0 & MathType Times New Roman(w - 2 ^Ax 2 bxSymbolww(w - 2  = & "System (%P*w-WOWWOWWOWWOWWOWWOW FMicrosoft Equation 2.0 DS Equation Equation.29q  Ax=bCompObj?AfObjInfoBEquation Native <_864128578J<EFC8dC8dOle PIC DGLMETA CompObjFHfL+`+`V!  .1  @&0 & MathType Times New Romanw - 2 6n 2 nSymbolwww - 2  & "System %P0w-0w- FMicrosoft Equation 2.0 DS Equation Equation.29q 7\oo nnl zL+`ObjInfoIEquation Native <_864128577LFC8dE8dOle PIC KNLMETA CompObjMOfObjInfoP+`v#  .1  @&0 & MathType Times New Roman(w - 2 6ny 2 nySymbolww(w - 2  & "System (%P(Ww-WWOWWOWWOWWOWWOWWOW FMicrosoft Equation 2.0 DS Equation Equation.29q 7\oo nn L 0Equation Native <_864128575`CSFE8dG8dOle PIC RULMETA  CompObjTVfObjInfoWEquation Native       !"#$%&')+./01234567:<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]_bcdefgjlmnopqrsuz|}~ $ q .1   &# & MathTypeTimes New Roman8 w - 2 ^AySymbolww8 w - 2 g= 2 jz 2 $ jz 2 jz 2 jz 2 jz 2 Ijz 2 zjz 2 jz 2 jz 2 jz 2 Gz 2 $ Gz 2 Gz 2 Gz 2 Gz 2 IGz 2 zGz 2 Gz 2 Gz 2 GzTimes New Roman8 w - 2 m1# 2 m0 2 m0 2 mL0# 2 m0 2 m0 2 m 2# 2 m 0 2 m0 2 m0# 2 m3 2 m0 2 M4# 2 M0 2 M5 2 MB0# 2 M0 2 M0 2 M 0# 2 M 0 2 M0 2 M0# 2 M0 2 M0 2 -0# 2 -0 2 -6 2 -L0# 2 -0 2 -0 2 - 0# 2 - 0 2 -0 2 -0# 2 -8 2 -0 2 9# 2 0 2 0 2 L0# 2 0 2 0 2  10 2  0 2  11 2 00 2 0 2 01 2 00 2 0 2 {21 2 Q00 2 0 2 01 2  00 2  0 2  14 2 00 2 0 2 04 2 00 2 0 2 04 2 L00 2 3 2 04 2 00 2 0 2 04 2 00 2 0 2 04 2 mb.0P 2 m.P 2 m.4P 2 mz .0P 2 m.P 2 mh.4P 2 M{.0P 2 M.P 2 M.4P 2 Mu .0P 2 M.P 2 Mm.4P 2 -v.0P 2 -.P 2 -.4P 2 -u .0P 2 -.P 2 -h.4P 2 {.0P 2 .P 2 .4P 2  .0P 2 .P 2 m.4P 2 v.0P 2 .P 2 .4P 2 u .0P 2 .P 2 m.4P 2 v.0P 2 .P 2 .4P 2 u .0P 2 .P 2 m.4P & "System 8 %P(w-WWOWWOWWOWWOWWOWWOWWOW FMicrosoft Equation 2.0 DS Equation Equation.29q OO A=1.00.00.02.00.03.04.05.00.00.00.00.00.06.00.00.00.08.09.00.00.010.011.00.00.02.00.00.014.00.00.00.03.00.00.00.0[]WOW_864128574Z FG8dGJ8dPIC (LMETA Y\-CompObj*lL84n0   FMicrosoft Word 6.0 Picture MSWordDocWord.Picture.69q Oh+'0  $ H l   Dh84  &WordMicrosoft Word  nSystem&`,w-Times New Romanw - - !o--- $]J]J--j[ Times New Romanw - 2 ^ n0"'-- $<]]<--[m-2 ^mn1.0""'-- $<F<F--@m-2 mn4.0""'-- $<uu<--tm-2 wmn6.0""'-- $<_<_--Xm-2 mn9.0""'-- $<<--m-2 mn2.0""'-- $<w<w--pm-2 mn3.0""'-- $JJFF--@j - 2 n0"'-- $uJuJ--jt - 2 w n1"'-- $JJ__--Xj - 2  n0"'-- $JJ--j - 2  n1"'-- $JJww--pj - 2  n2"'-- $]']'--BZ- 2 ^n'-- $]]--([- 2 ^n3"'-- $]]--[+-2 ^+n2.0""'-- $FF--@+-2 +n5.0""'-- $uu--t+-2 w+n8.0""'-- $__--X-2 n10.0"""'-- $---2 n14.0"""'-- $I]]I---- $FF--@(- 2 n1"'-- $uu--(t- 2 wn5"'-- $__--X(- 2 n3"'-- $--(- 2 n4"'-- $''__--]B- 2 n'-- $II---- $eeFF--E-- 2 -n'-- $ueue--r-- 2 v-n'-- $ee---- 2 -n'-- $l]]l--[- 2 ^n5"'-- $]l]l--[-2 ^n3.0""'-- $I_I_---- $IuuI---- $ll__--X-2 n11.0"""'-- $IFIF---- $]e]e---- $IwIw--up- 2 pn'-- $l_l_--X- 2 n4"'-- $ee__---- $FFFF---- $F^F^---- $F^^F---- $FvFv---- $FvvF---- $GG----- %{ <---- $ < ----- %{ <---- $ < ----- %{ <---- $ < ----- %{0 0<0---- $ )<0 7----- %{ <---- $ < ----- %{H H<H---- $ A<H P---- $----- %y---- $----- %y---- $----- %y000---- $)07---- $----- %6---- $----- %6000---- $)07----- %y----- %y---WWOWWOWWOWWOWWOWWOWWOWObjInfo[^,ObjectPoolGJ8dGJ8dWordDocument]_,SummaryInformation(-ܥe= e,i~$j$jj$j$$$$%%%%%% &%%a+16%6%6%6%6%J)J)J)J)L)L)L)k)W*C++T+6a+$J)6%J)J)J)a+J)j$j$6%6%J)J)J)J)j$6%$6%J)%%j$j$j$j$J)J)J)J) 4 11.0 3.0 5 4 3 5 1 14.0 10.0 8.0 5.0 2.0 3 2 1 0 1 0 3.0 2.0 9.0 6.0 4.0 1.0 0 01WX@306  76@206 76@106MD$C#@006LED@/06MD%C$@.06LED@-06 D%C$@,06D$C#@+06 76@*066D$C#@)06 G76@(06D%C$@'06 76@&06D%C$@%06&"D%C$@$06Y8"@#06 D%C$@"06Y @!06D$C#@ 06Y@066D$C#@06YG@06D%C$@06Y@06D%C$@06YD0:!Q PPD0:@ Q PPD0:Q PPD0:R QQD0:OQ PPD0:Q PPD0:( 20($D0:5( 20( !D0:! D0:7 D 0: 2 0($D 0:( D 0: D 0:( 20(3iD0:7 20(3D0:57 20(6p D0:x  20(6D0: 20(6D/: D/:x  2/(o D/:( 2/(Gt D/:x  2/(G$D/:( 2/(GD/: 2/(GD/: D/:7 2/(t D/:Vx  2/($D/:V( 2/(iD/:V 2/(iD/:V 2/(3iD/:V7 2/(G3D/:7 2/(o0D/:7 2/(!D/:! 2/(t D/:x  2/($D/:( 2/(D/: 2/(D/: 2/(!iD/:(! 2/(t iD/:(x  2/($iD/:(( 2/(iD/:( 2/(iD/:( 2/(3iD/:(7 2/(3D/:7 0/&0Wp2 ࡱ> hilo}uababc uDa iklnotuyz|}njjjj-jjjjjjnK@Normala "A@"Default Paragraph Font  #&),28=BGJMPSVY\afkpuz}   !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefg6v6v6v6v6v6 v 6 v B  R   >  <:l&XVBt.`^J|6h"T@,p\Hz4f@\\ITL101\QMS860Ne00:winspool\\ITL101\QMS860 P|p odXLetter PRIV` N N N'\\ITL101\QMS860 P|p odXLetter@PRIV` N N N'gh1Times New Roman Symbol &Arial"hBBVA6Robert P. ElliottRobert P. Elliott1C:\WIN16APP\MSOFFICE\WINWORD\TEMPLATE\NORMAL.DOTRobert P. ElliottRobert P. Elliott@05@Jv@5V@IkMicrosoft Word 6.02_864128573nXbFGJ8dL8dOle 8PIC ad9LMETA ;Ln n* * .1   &` & MathTypeTimes New Romanh w - 2 _u 2 _vx 2 _uy 2 _%x 2 _ u 2 _ xy 2 _Ry 2 ]u 2 ]=xy 2 ] u 2 ] x 2 ]Ryy 2 >u 2 > x 2 >Ryy@Times New Romanh w - 2  ny` 2  n` 2  ny` 2  n` 2 ` ny` 2  n` 2  ny` 2 n`@Times New Romanh w - 2 h0` 2 0` 2 0` 2 /0` 2 1` 2 1` 2 0 0` 2  1` 2 v 1` 2 0` 2 S1` 2 1` 2 1` 2 < 1` 2  1` 2 v 1` 2 1` 2 < 1` 2  1` 2 v 1` 2 1` 2 ,0 2 ,0 2  ,0 2 ,0 2  ,0 2  ,0Times New Romanh w - 2 _T,P 2 >.PSymbolwwh w - 2 _+ 2 _+ 2 _+ 2 _3= 2 ].+ 2 ]+ 2 ]3= 2 >3= 2 v< 2 < 2 < 2 < 2 < 2 < 2 <@Symbolwwh w - 2 : -i 2  -i 2 : -i 2  -i 2  -i 2 : -i 2  -i 2 L-iMT Extrawh w - 2 _L@ 2 ]L@ 2 ^gMk & "System h %PPkw- FMicrosoft Equation 2.0 DS Equation Equation.29qCompObjce^fObjInfof`Equation Native a_864128572iFL8dL8d/"OO u 0,0 x 0 +u 0,1 x 1 +L+u 0,n-1 x n-1 =y 0 ,u 1,1 x 1 +L+u 1,n-1 x n-1 =y 1 Mu n-1,n-1 x n-1 =y n-1 .{L@ 2 ]L@Ole hPIC hkiLMETA k(CompObjjltfL4,@4*  .1  & & MathType`Times New Romanh w - 2 @6u@Times New Romanh w - 2 i5 2 dj5@Times New Romanh w - 2  ,0 & "System h %PPkw-Pkw- FMicrosoft Equation 2.0 DS Equation Equation.29q /"PO`o u i,jObjInfomvEquation Native w<_864128571ugpFL8dO8dOle xL{h,{* $ .1  @& & MathTypePTimes New Romanh w - 2 @Jx@Times New Romanh w - 2 n`PIC oryLMETA {hCompObjqsfObjInfot@Symbolwwh w - 2 D-i@Times New Romanh w - 2 1` & "System h %PPkw-Pkw- FMicrosoft Equation 2.0 DS Equation Equation.29qEquation Native <_864128570wFO8dO8dOle PIC vyL /"`+X+ x n-1L,*  .1  &@ & MathTypePTimes New Romanh w - 2 @Jx@Times NeMETA CompObjxzfObjInfo{Equation Native <w Romanh w - 2 0` & "System h %PPkw-Pkw-h w FMicrosoft Equation 2.0 DS Equation Equation.29q /"** x 0_885975762~ FO8dX8dPIC LMETA }CompObjlL(V2m   FMicrosoft Word 6.0 Picture MSWordDocWord.Picture.69q Oh+'0  $ H l   Dh    ". !/$%&'()*+,-031245689:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuwxyz{|}~(V2 i >&WordMicrosoft Word   System&`x8w-Times New Romanw - - ! --- $KK++--gMTimes New Romanw -)2 jM Each processor finds)""!!""' v-"2  its local pivotf!"!!'-- $  --.k-(2 1k Each local pivot is)"!"!!'za-(2 }a sent to master node"!4"!"'-- $XX--FO-)2 O Master node collects<"!"!':f-)2 =f all local pivots and!"!!""'r-&2 r finds the one with"""!"/"'*-22  the largest absolute value"!"!"!"'-- $**oo--R'-,2 U' Master node broadcasts<"!""!"'q-&2 q the ID of the node"0!""!"'BQ-42  that contains the pivot row"!""""!!!/'-- $QVrVr Q ---:2  The node that has the pivot row(""!"""""!!!/'/ ->2 " broadcasts the row to other nodes."!""!/!!""!"'z " -82 %  Other nodes get the pivot row.0""!"!""!!!/'-- $K y y K --& ` -52  Each node uses the pivot row)""!""""!!!/'r  -=2  ! to eliminate its own part of data!4"!/""!"'--- %b+bvbb ---- $tb O----- %bb:bzb---- $tzbOz----- %bXbbb*---- $tb*O----- %bobb bV---- $t bVO ----- %b b ba b ---- $ta b Oa ---- ! ObjInfoObjectPoolX8dX8dWordDocument SummaryInformation(ܥe= e"jjjj  U1>>>>@@@_K7T6U>N>>>U>jj>>>>jH>jjjj>>>> to eliminate its own part of data Each node uses the pivot row Other nodes get the pivot row. broadcasts the row to other nodes. The node that has the pivot row that contains the pivot row the ID of the node Master node broadcasts the largest absolute value finds the one with all local pivots and Master node collects sent to master node Each local pivot is its local pivot Each processor finds 11@ 06)YX-D0:'WV@06#YX-D0:",x+@06YX-D0:~F@06YX-D0:5@06YX-D0:yh20(O+C 20(>*qD0:G* ;   ::20(&20(% 20(*%R D0:$ E DD20(C!20( 2 0(D 0:w=  <<  2 0(;E2 0(e2 0(20(UD0:# ""20(~20(D0:i; ::20(6XH20(ND0:i_ 00&1o !"J uabc uDa"DEbc,-@AVWlmn F r T FfIn #K@Normala "A@"Default Paragraph Font#Aa 5K`u  J   RZb <n H z  " T   . r    @\\ITL101\QMS860Ne00:winspool\\ITL101\QMS860 P|p odXLetter PRIV` N N N'\\ITL101\QMS860 P|p odXLetter@PRIV` N N N'1Times New Roman Symbol &Arial"h!6Robert P. ElliottRobert P. Elliott1C:\WIN16APP\MSOFFICE\WINWORD\TEMPLATE\NORMAL.DOTRobert P. ElliottRobert P. Elliott@LTP+@Jv@LTP+@Microsoft Word 6.02_885975734| FX8dZ8dPIC LMETA HCompObjlLhZ2  FMicrosoft Word 6.0 Picture MSWordDocWord.Picture.69q Oh+'0  $ H l   DhhZ2  I&WordMicrosoft Word   =System&`/w-Times New Romanw - - ! =--- $KK++--DfUTimes New Romanw - 2 iU= Begin from the-!"!4"' }-2 } = last variablee!"'-- $ , ,--T-k-@2 0k#= Each node searches its part of data()""!"""!"'y-/2 |= for the current variable!"""!"'-- $""XX--F-,2 = The node that contains(""!""!""'9-42 <= the solution of the current "!"!"!"""'(-;2  = variable broadcasts the value of!""!""!"!')-72 = the variable to other nodes. "!"!!""!"'-- $f**ofo--x-42 {= All the other nodes receives/"!""!"!' -72 = the solution of that variable"!"!"!"!"'-- $ V/V/  -- -72 = All the nodes back-substitute/""!""""""'T -22 = the value of the variable."!"!"!"'-- $J   J --&  -I2 )= The above process repeats until solutions(""!!"!"""!"!"'r  -:2  = of all the variables are found. !"!"!"""'--- %+v ---- $0  -----%::z---- $0z z-----%X*---- $0* ----- %o V---- $0 V  ----- %  a  ---- $0a  a ---t ObjInfoObjectPoolZ8dZ8dWordDocument#SummaryInformation(ܥe= e jjjj  91""""$$$C/jT69":"""9"jj""""j@"jjjj"""" of all the variables are found. The above process repeats until solutions the value of the variable. All the nodes back-substitute the solution of that variable All the other nodes receive the variable to other nodes. variable broadcasts the value of the solution of the current The node that contains for the current variable Each node searches its part of data last variable Begin from the ll@06)YX-D0:'WV@06#YX-D0:"-x,@06YX-H0>} F@06YX+H0> 5@06YX-D0:xh20(@O+b20(* D0:G* ; ::20(;&E20(D%DD0:$ E DD20(W b2 0(2 D 0:w?  >>  2 0(m:&2 0(2 0(20( D0:  20( 20(a D0: ; ::20(eW20(D0:h^ 00&l  uabc uDa @Akl#$@AXYrsd FEb(b K@Normala "A@"Default Paragraph Font!Lh!9Sx    KWc =o [    # g    A s @\\ITL101\QMS860Ne00:winspool\\ITL101\QMS860 P|p odXLetter PRIV` N N N'\\ITL101\QMS860 P|p odXLetter@PRIV` N N N'1Times New Roman Symbol &Arial"he!6Robert P. ElliottRobert P. Elliott1C:\WIN16APP\MSOFFICE\WINWORD\TEMPLATE\NORMAL.DOTRobert P. ElliottRobert P. Elliott@j)@Jv@LTP+@F#Microsoft Word 6.03_864128566FZ8dZ8dOle PIC LMETA (L(,("  .1   &  & MathType@Times New Romanw - 2 `,b@Times New Romanw - 2  TkSymbolwww - 2 `= 2 `%-Times New Romanw - 2 ` [l 2 `.P 2 `,P 2 `l.P 2 `B,P 2 `".P 2 `,P 2 `.P 2 `| ,P 2 `\ .P 2 ` ,P 2 ` .P 2 ` ]l 2 `|3 2 `>0 2 `2 2 `0 2 `4 2 ``0 2 `,1 2 `0 2 ` 2 2 ` 5 2 `R 1 2 ` 4 & "System %Pȭw-`R 1 2 `  FMicrosoft Equation 2.0 DS Equation Equation.29q('$,'$ b=[3.0,-2CompObjfObjInfoEquation Native 1Table7".0,4.0,1.0,2.5,1.4] T & MathTyOh+'0 (4 P \ h t+Parallel Implementations of Direct SolversoaraRobert P. Elliottatobe Normal.dotl jmconradtl8coMicrosoft Word 10.0SummaryInformation(DocumentSummaryInformation8HCompObjj@6F@7d@~ (d@8d3)s՜.+,0 hp  AHTDlE{ +Parallel Implementations of Direct Solvers Title$H@H Normal5$7$8$9DH$_HmH sH tH P@P Heading 1$<@&5CJKHOJQJN@N Heading 2$<@&56CJOJQJD@D Heading 3$<@&5CJF@F Heading 4$<@& 56CJF@F Heading 5 <@& CJOJQJJ@J Heading 6 <@&6CJOJQJB@B Heading 7 <@&OJQJDA@D Default Paragraph FontVi@V  Table Normal :V 44 la (k(No List 4@4 Header  !FOF lxytitle1$da$5@O@ lxytitle2dxx6HO"H lxypara1$#dxx`#a$:O2: lxypara2$xxa$4 @B4 Footer  !.)@Q. Page Number6"@6 Caption xx5T#T Table of Figures ! p^`p66 TOC 2 ! ^22 TOC 1  ! 52O2 lxytitle3666 TOC 3 ! ^66 TOC 4 ! X^X66 TOC 5 !  ^ 66 TOC 6 ! ^66 TOC 7 ! ^66 TOC 8 ! x^x66 TOC 9! ! @^@6@"6 ue Footnote Text"@&@1@ ueFootnote ReferenceH*qN~k|}N6t>s} <ijk6F 8 n % b { N H$!""#S(H)U)c)n)))*+!-/--/:24445Q5e5,6D6)9e9f9j9999999:::6:E:`:j::::::;;;;S;T;s;;;;;; <.<<<o<<<<<<<<<= ==B===>>>>>>>??3?=?\?h?z???????? @@4@@@i@v@@@@@@@@@@A2AYAkDDDDKKNQQCRRRGSSSJTTTNUUUQVVVPWWWKXYYY\^^v```/a0abaccccde!fFfrj|kkop/pOpppp-qmqq)rLrwuyuuuuuuuuuuuuuuuuv vvvvvvvvv}xUzzzzzzzzzzzzzzzzzzzz{{{{ { { {{{{{{{{{|i|~|||||||||||||||||||}0}H}W}n}}}~}}}}}}}}}}}}}}}}}}}}}}}}}}}}~hx%1˅:\ڇ<=>?JKLO000000000000000000000000000000000000000000000000000000000@0@00000@0@000@00000000000000@0000000000 00 00 00 000 0 0 00 0 0 0 0@0(@0@0@0 @0000000000@000@00000000000H0000000@0@0 000000000P0@0000@0000@0000@000000000`00000000000000000000000000000000000000000000000000000000000 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0800 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 000 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00000000 0 0 0 0 0 0@0@"00z@0@0@0@0@007z>s} <ijk6F 8 n % b { N H$!""#S(H)U)c)n)))*+!-/--/:24445Q5e5,6D6)9e9f9j9999999:::6:E:`:j::::::;;;;S;T;s;;;;;; <.<<<o<<<<<<<<<= ==B==>>>>>>??3?=?\?h?z???????? @@4@@@i@v@@@@@@@@@@A2AYAkDDDDKKNQQCRRRGSSSJTTTNUUUQVVVPWWWKXYYY\^^v```/a0abaccccde!fFfrj|kkop/pOpppp-qmqq)rLrwuyuuuuuuuuuuuuuuuuv vvvvvvvvv}xUzzzzzzzzzzzzzzzzzzzz{{{{ { { {{{{{{{{{|i|~|||||||||||||||||||}0}H}W}n}}}~}}}}}}}}}}}}}}}}}}}}}}}}}}}}~hx%1˅:\ڇ<=>?JKO00000000000000000000000000000000000000000000000000000000@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0@0 @0@0 @0 @0@0@0@0@0 @0 @0 @0 @0 @0 @0@0@@0 @0 >{020@0<{020<{020>{020<{020<{020>{020>{020>{020>{020>{020>{020>{020>{020>{020<{020>{020>{020<{020>{020>{020>{020>{020>{020>{020>{020>{020>{020>{020>{020>{020>{020>{020>{020>{020>{020<{020@0 000@0@0@00@0@0@0@0H@0H@0H@0@0H@0H@0H@0@0H@0H@0P@0@0P@0P@0P@0P00000000X0X0`0`00`00`000`0`0`0`0`000h0h000000p0p0p0p0p0p0p00000000000p0p0p00p00p0 0 0 00 00 0 000(00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0H00 00 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0000x0x000 0 0 0 0 0I5T<{00 0z@0>{00,@0@0 0xJzk:>Eh6}\NILNPQTXZk{5BDGL_m}Ȃ}ʅNJMORSUVWY[\]^_`abcdefghijMK 4 6 n   ! # : N P b v x >#R#T#i#}##4444444 555.505,6@6B6666777777```ccc5uIuKu[zqzsz |#|%||||N:::::::::::: : :::::::    !8  @( (  P  3 1?"bB  c $1?"bB   c $1?"bB   c $1?"B S  ?QRJTUN b t tt t _Toc326038038 _Toc326056189 _Toc326123706 _Toc326123810 _Toc326123850 _Toc326126244 _Toc327263167 _Toc327267057 _Toc76862675 _Toc326038039 _Toc326056190 _Toc326123707 _Toc326123811 _Toc326123851 _Toc326126245 _Toc327263168 _Toc327267058 _Toc76862676 _Toc326038040 _Toc326056191 _Toc326123708 _Toc326123812 _Toc326123852 _Toc326126246 _Toc327263169 _Toc327267059 _Toc76862677 _Toc326038041 _Toc326056192 _Toc326123709 _Toc326123813 _Toc326123853 _Toc326126247 _Toc327263170 _Toc327267060 _Toc76862678 _Toc326038042 _Toc326056193 _Toc326123710 _Toc326123814 _Toc326123854 _Toc326126248 _Toc327263171 _Toc327267061 _Toc76862679 _Toc326038043 _Toc326056194 _Toc326123711 _Toc326123815 _Toc326123855 _Toc326126249 _Toc327263172 _Toc327267062 _Toc76862680 _Toc76862681 _Toc326038045 _Toc326056196 _Toc326123713 _Toc326123817 _Toc326123857 _Toc326126251 _Toc327263174 _Toc327267064 _Toc76862682 _Toc326038046 _Toc326056197 _Toc326123714 _Toc326123818 _Toc326123858 _Toc326126252 _Toc327263175 _Toc327267065 _Toc76862683 _Toc326038049 _Toc326056200 _Toc326123717 _Toc326123821 _Toc326123861 _Toc326126255 _Toc327263178 _Toc327267068 _Toc326038047 _Toc326056198 _Toc326123715 _Toc326123819 _Toc326123859 _Toc326126253 _Toc327263176 _Toc327267066 _Toc76862684 _Toc326038048 _Toc326056199 _Toc326123716 _Toc326123820 _Toc326123860 _Toc326126254 _Toc327263177 _Toc327267067 _Toc76862685 _Toc76862686 _Toc326038050 _Toc326056201 _Toc326123718 _Toc326123822 _Toc326123862 _Toc326126256 _Toc327263179 _Toc327267069 _Toc326126218 _Toc327263134 _Toc327267041 _Toc76862704 _Toc76862687 _Toc326038051 _Toc326056202 _Toc326123719 _Toc326123823 _Toc326123863 _Toc326126257 _Toc327263180 _Toc327267070 DDE_LINK3 _Toc326126219 _Toc327263135 _Toc327267042 _Toc76862705 _Toc76862688 _Toc326038052 _Toc326056203 _Toc326123720 _Toc326123824 _Toc326123864 _Toc326126258 _Toc327263181 _Toc327267071 _Toc76862689 _Toc326038053 _Toc326056204 _Toc326123721 _Toc326123825 _Toc326123865 _Toc326126259 _Toc327263182 _Toc327267072 _Toc76862690 _Toc326038054 _Toc326056205 _Toc326123722 _Toc326123826 _Toc326123866 _Toc326126260 _Toc327263183 _Toc327267073 _Toc76862691 _Toc326038055 _Toc326056206 _Toc326123723 _Toc326123827 _Toc326123867 _Toc326126261 _Toc327263184 _Toc327267074 _Toc326126220 _Toc327263136 _Toc327267043 _Toc76862706 _Toc76862692 _Toc326038056 _Toc326056207 _Toc326123724 _Toc326123828 _Toc326123868 _Toc326126262 _Toc327263185 _Toc327267075 _Toc76862693 _Toc326038057 _Toc326056208 _Toc326123725 _Toc326123829 _Toc326123869 _Toc326126263 _Toc327263186 _Toc327267076 _Toc76862694 _Toc326038058 _Toc326056209 _Toc326123726 _Toc326123830 _Toc326123870 _Toc326126264 _Toc327263187 _Toc327267077 _Toc76862695 _Toc326037304 _Toc326045886 _Toc326055737 _Toc326123646 _Toc326125469 _Toc326126196 _Toc327263099 _Toc327267002 _Toc76862707 _Toc326038059 _Toc326056210 _Toc326123727 _Toc326123831 _Toc326123871 _Toc326126265 _Toc327263188 _Toc327267078 _Toc76862696 _Toc326038060 _Toc326056211 _Toc326123728 _Toc326123832 _Toc326123872 _Toc326126266 _Toc327263189 _Toc327267079 _Toc76862697 _Toc326038061 _Toc326056212 _Toc326123729 _Toc326123833 _Toc326123873 _Toc326126267 _Toc327263190 _Toc327267080 _Toc76862698666666666"""""""""444444444jDjDjDjDjDjDjDjDkDDDDDDDDDDDKKKKKKKKKYYYYYYYYYYYYYYYYYYYYYYYYYY^````````````0acccccccccccccceeeeeeeee!f!f!f!f!f!f!f!f!f|k|k|k|k|k|k|k|k|kooooooooOpOpOpOppppppppppp)r)r)r)r)r)r)r)r)rvvvvvvvvvUzUzUzUzUzUzUzUzUz}}}}}}}}}hhhhhhhhh%%%%%%%%%O  !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGH[\]^_`abIJKLMNOPQRSTUVWXYZchijklmnodefgpvwxyz{|}uqrst~EEEEEEEEE#########!#!#!#!#!#!#!#!#!#d5d5d5d5d5d5d5d5d5kDkDkDkDkDkDkDkDDDDDDDDDDDDKKKKKKKKKYYYYYYYYYYYYYYYYYY^^^^^^^^^.a.a.a.aaaaaaaaaaaaaaaaaaacccccddddddddd f f f f f f f f fEfEfEfEfEfEfEfEfEfkkkkkkkkkppppppppppppppppppppppKrKrKrKrKrKrKrKrKrvvvvvvvvvzzzzzzzzz~~~~~~~~~vvvvvvvvv/////////OT:?|=?:D?<LG<;:L@ @L# #,i h D{#/l0  %%36T{T{Z{^{ɄɄׄO     ##15;;Y{]{h{h{ՄللO    >*urn:schemas-microsoft-com:office:smarttags PersonName9*urn:schemas-microsoft-com:office:smarttagsplace=*urn:schemas-microsoft-com:office:smarttags PlaceName=*urn:schemas-microsoft-com:office:smarttags PlaceType8*urn:schemas-microsoft-com:office:smarttagsCity9*urn:schemas-microsoft-com:office:smarttagsStateB*urn:schemas-microsoft-com:office:smarttagscountry-region> *urn:schemas-microsoft-com:office:smarttags PostalCode  disy!"!"&"j"o"""""H)N)])a)e)h)i)l)p)v)))))))))6*<*?+B+++!-'-8->-<2B2D2J2T2Z2\2b2[6]666666666f9i999999999:::::::;;;P;Q;;;;;< <<<j<m<w<}<<<<<<<<<<<<<= ===>>>>>>>>??T?U?????????,@-@e@f@o@t@z@@@@@@@@EE'M/M,P1P ]]]]f fOhXh"i.i2i;i?iJiNiWi`ihiiiiiiikkkkVpWp}pp!q)q-q5qqqvqqqqqqqqqrrrr{u|uuuuuwwwwxx!{&{||(}/}I}M}o}s}8=?EFKMSfm !(*/08:?hqx~\܇=LO+.>@4? % ( J_3!@!""##;#=#H)N)W)\)e)h)p)v)!-'-/-1-:22f4m466S9\9f9i9l9t99999999999:::(:::@:I:L:d:i:p:r::::::::::; ;;#;*;?;E;X;e;w;z;;;;;;;;;;;<<6<;<F<H<w<}<<<<<<<<<<<<<<<<<= =====>>>>>>>>>>????7?