Descriptions for Implementation of OpenMP Programs

________________________________________________________
 

 

2.2. Descriptions for Implementation of OpenMP Programs

Example 1: Description for Implementation of an OpenMP program to print unique identifier using OpenMP PARALLEL directive.

  • Objective
  • Write a simple OpenMP program to print unique thread identifier for each thread using OpenMP PARALLEL directive.

  • Description
  • Each thread prints its thread identifier. In this program OpenMP PARALLEL directive and OMP_GET_THREAD_NUM run-time library routine is used. The PARALLEL directive pair specifies that the enclosed block of program, referred to as PARALLEL region, be executed in parallel by multiple threads. The PRIVATE clause is typically used to identify variables that are used as scratch storage in the program segment within the parallel region. It provides a list of variables and specifies that each thread have a private copy of those variables for the duration of the parallel region.

  • Input
  • User has to set OMP_NUM_THREADS environment variable for n number of threads.
    e.g. If user is using C shell then he can append one line to his .cshrc as below :

    setenv OMP_NUM_THREADS 4 

  • Output
  • Each thread prints its thread identifier.

Example 2: Description for Implementation of an OpenMP program to print Hello World using OpenMP PARALLEL directive.

  • Objective
  • Write a simple OpenMP program to print "Hello World" using OpenMP PARALLEL directives.

  • Description
  • Each thread prints "Hello World" message. In this program OpenMP PARALLEL directive, PRIVATE clause and OMP_GET_THREAD_NUM run-time library routine is used. The PARALLEL directive pair specifies that the enclosed block of program, referred to as PARALLEL region, be executed in parallel by multiple threads. The PRIVATE clause is typically used to identify variables that are used as scratch storage in the program segment within the parallel region. It provides a list of variables and specifies that each thread have a private copy of those variables for the duration of the parallel region.

  • Input
  • User has to set OMP_NUM_THREADS environment variable for n number of threads.
    e.g. If user is using C shell then he can append one line to his .cshrc as below:

    setenv OMP_NUM_THREADS 4 

  • Output
  • Each thread prints a message "Hello World"

Example 3: Description for Implementation of an OpenMP program for Matrix Vector multiplication using OpenMP PARALLEL directive.

  • Objective
  • Write an OpenMP program for computing matrix vector multiplication using OpenMP PARALLEL directive.

  • Description
  • Each row of matrix A is multiplied with elements of  vector B(i) and the resultant vector is stored in vector C(i). It is assumed that number of columns of the matrix A and size of the vector are same. This example demonstrates the use of OpenMP loop of work-sharing construct i.e. distribution of columns of Matrix A. The ORDERED section directive is used to improve an order across the elements of C(i). Matrix A and vector B are generated automatically.

  • Input
  • Size of Matrix

  • Output
  • Each thread computes the multiplication and prints its part of vector C(i)

Example 4: Description for Implementation of an OpenMP program to find sum of elements of one-dimensional real array A using OpenMP PARALLEL DO directive.

  • Objective
  • Write an OpenMP program to find sum of elements of one-dimensional real array A using OpenMP PARALLEL DO directive.

  • Description
  • Addition of all the elements of a real array A is performed using OpenMP PARALLEL DO directive and PRIVATE and SHARED clauses. SHARED and PRIVATE clauses explicitly scope a specific variable. We can explicitly scope a variable as SHARED or PRIVATE on an OpenMP construct by adding a clause to the directive that begins the construct.

  • Input
  • Size of an real array A. Elements of Array A are generated automatically.

  • Output
  • Sum of n elements of an array A.

Example 5: Description for Implementation of an OpenMP program to compute the value of PI by Numerical Integration using OpenMP PARALLEL section.   

  • Objective
  • Write an OpenMP program to compute the value of PI by numerical integration of a function f(x) = 4/(1+x*x ) between the limits 0 and 1 using OpenMP PARALLEL section.

  • Description 
  • There are several approaches to parallelizing a serial program. One approach is to partition the data among the threads. That is we partition the interval of integration [0,1] among the threads, and each thread estimates local integral over its own subinterval. The local calculations produced by the individual threads are combined to produce the final result. To perform this integration numerically, divide the interval from 0 to 1 into n subintervals and add up the areas of the rectangles as shown in the Figure 1 (n = 5). Large values of n give more accurate approximations of PI value.

    Fig. 1 : Numerical Integration of PI function

    In this program OpenMP PARALLEL directive, and CRITICAL section is used. The CRITICAL directive specifies a region of program that must be executed by only one thread at a time. If a thread is currently executing inside a CRITICAL region and another thread reaches that CRITICAL region and attempts to execute it, it will block until the first thread exits that CRITICAL region.

  • Input
  • Number of intervals on command line.

  • Output
  • Computed value of PI

Example 6: Description for Implementation of an OpenMP program to compute sum of elements of one-dimensional real array A using OpenMP REDUCTION clause.

  • Objective
  • Write an OpenMP program to find sum of elements of one-dimensional real array A using OpenMP REDUCTION clause.

  • Description 
  • Addition of all the elements of a real array A is performed using OpenMP PARALLEL DO directive and REDUCTION clause. Reductions are a sufficiently common type of operation that OpenMP includes a reduction data scope clause just to handle the variable. In reduction, we repeatedly apply a binary operator to a variable and, and store the result back in the variable. In REDUCTION a private copy for each list variable is created for each thread. At the end of the reduction, the reduction variable is applied to all private copies of the shared variable, and the final result is written to the global shared variable. In this example we have added the clause REDUCTION ( + : Sum), which tells the compiler that Sum is the target of a sum reduction operation.

  • Input
  • Size of an real array A.

  • Output
  • Sum of n elements of an array A.

Example 7 : Description for Implementation of an OpenMP program for parallelization of recursive computation using OpenMP CRITICAL section.

  • Objective
  • Write an OpenMP program to for recursive computation

     do i = 2, N 
         a(i) = (a(i-1) + a(i))/2 
     end do

  • Description
  • The array elements are added recursively. i loop is running from 2 to N .The i+1 iteration depends on the current element and the previous iteration value i.e. ith iteration value , so it can't be parallelized directly. Using OpenMP pragma's as it involves dependencies. It can be done only through the synchronization i.e. through CRITICAL Section.

  • Input
  • The Size of the Array

  • Output
  • The Input and Output of the Array

Example 8: Description for Implementation of an OpenMP program to compute the value of PI function by Numerical Integration using OpenMP REDUCTION clause.   

  • Objective
  • Write an OpenMP program to compute the value of PI by numerical integration of a function f(x) = 4/(1+x*x ) between the limits 0 and 1 using OpenMP REDUCTION Operation.

  • Description 
  • PI value is computed using OpenMP PARALLEL DO directive and REDUCTION clause. Reductions are a common type of operation. OpenMP includes a reduction data scope clause just to handle the variable. In reduction, we repeatedly apply a binary operator to a variable, and store the result back in the variable. When a program performs a reduction using a commutative-associative operator, reduction can be easily parallelised by adding a REDUCTION clause to the PARALLEL DO directive. In REDUCTION a private copy for each list variable is created for each thread. At the end of the reduction, the reduction variable is applied to all private copies of the shared variable, and the final result is written to the global shared variable. In this example we have added the clause REDUCTION ( + : Local Sum), which tells the compiler that LocalSum is the target of a sum reduction operation. For more details refer Example 5.

  • Input
  • Size of an real array A.

  • Output
  • Sum of n elements of an array A.

Example 9 : Description for implementation of an OpenMP program for parallelization of a loop nest containing a recurrence statement using OpenMP PARALLEL DO directive.

  • Objective
  • Write a simple OpenMP program for parallelization of a loop nest containing a recurrence using OpenMP PARALLEL DO directive.

    do j = 1, N
        do i = 1, N
            a(i,j) = a(i,j) + a(i,j-1)
        end do
    end do

  • Description
  • In this program the j loop contains a recurrence that is difficult to remove, so the i loop is parallelised instead. We have used OpenMP PARALLEL DO directive. The choice of which loop in the nest executes in parallel can have a profound impact on performance, so the parallel loop must be chosen carefully.

  • Input
  • Size of Matrix

  • Output
  • The original values of Matrix and the status of execution i.e. The status of comparison of parallel and serial result of a loop nest containing recurrence.

Example 10 : Description for implementation of an OpenMP program to find the largest element in a list of numbers using OpenMP CRITICAL section.

  • Objective
  • Write an OpenMP program to find the largest element in a list of numbers using OpenMP CRITICAL section.

  • Description
  • Larget element in a list of numbers is found using OpenMP CRITICAL section and PARALLEL DO loop. Before the CRITICAL section starts the LargeNumber is examined without any synchronization. If present value of LargeNumber is already greater than Array(i), then we need proceed no further. Otherwise Array(i) may be the largest element seen so far, and we enter the CRITICAL section. Inside the CRITICAL section again the LargeNumber is examined. This is necessary because previously examined LargeNumber outside the CRITICAL section so it could have changed in the meantime. Examining it again within the CRITICAL section guarantees the thread exclusive access to LargeNumber, and an update based on the fresh comparison is assured of being correct. Furthermore, this program will enter the CRITICAL section only infrequently and benefit from increased parallelism.

  • Input
  • Number of elements of an array

  • Output
  • Largest element of an array.

Example 11 : Description of an OpenMP program to find the largest element in a list of numbers using OpenMP LOCK routines.

  • Objective
  • Write an OpenMP program to find the largest element in a list of numbers using OpenMP Lock routines.

  • Description
  •   Lock routines are another mechanism for mutual exclusion, but provide greater flexibility in their use as compared to the CRITICAL directives. Firstly, the set/unset subroutines calls do not have to be block structured. The user can place these calls at arbitrary points in the program and is responsible for matching the invocations of the set/unset routines. Secondly, each lock subroutine takes a lock variable as an argument. In this example OpenMP Lock routines  and PARALLEL DO loop is used.

  • Input
  • Number of elements of an array

  • Output
  • Largest element of an array.

Example 12 : Description for implementation of an OpenMP programi to find largest element in a list of numbers using OpenMP REDUCTION clause.

  • Objective
  • Write an OpenMP program to find the largest element in a list of numbers using OpenMP REDUCTION clause

  • Description
  • Largest element in a list of numbers is found using OpenMP PARALLEL DO directive and REDUCTION clause. Reductions are a sufficiently common type of operation. OpenMP includes a reduction data scope clause just to handle the variable. In reduction, we repeatedly apply a binary operator to a variable and some other value, and store the result back in the variable. In this example we have added the clause REDUCTION ( MAX : LargeNumber), which tells the compiler that LargeNumber is the target of a sum reduction operation.

  • Input
  • Number of elements of an array

  • Output
  • Largest element of an array.

Example 13 : Description for implementation of an OpenMP program to find Matrix-Matrix multiplication using OpenMP PARALLEL DO directive

  • Objective
  • Write a OpenMP program for matrix-matrix multiplication using OpenMP PARALLEL DO directive and measure the performance.

  • Description
  • In this example we have shown how to parallelise the nested loop. Loop nest can contain more than one loop, and arrays can have more than one dimension. The three-deep loop nest in Matrix-matrix multiplication, computes the product of two matrices C = A * B. Usually we want to parallelize the outermost loop in such nest. For correctness, there must not be a dependence between any two statements executed in different iterations of parallelized loop. However, there may be dependences between statements executed with in a single iteration of the parallel loop, including dependences between different iterations of an inner, serial loop. In this, example, we can safely parallelize the j loop because each iteration of the loop computes one column FinalMatrix(1:MatrixSize,j) of the product and does not access elements of FinalMatrix that are outside that column. The dependence on FinalMatrix(i,j) in the serial k loop does not inhibit parallelization. In this example PARALLEL section, SHARED, PRIVATE clause and DO directive are used.

  • Input
  • Size of matrix

  • Output
  • Result of Matrix matrix computation, time taken for this computation and Mflop/s

Example 14 : Description for implementation of an OpenMp program to find the Transpose of a Matrix using OpenMP PARALLEL DO directive

  • Objective
  • Write a OpenMP program for transpose of a matrix using OpenMP PARALLEL DO directive and measure the performance

  • Description
  • In this example we have shown how to parallelise the nested loop. Loop nest can contain more than one loop, and arrays can have more than one dimension. The two-deep loop nest in Transpose of a matrix , changes the corresponding rows and columns of the input matrix to columns and rows of the output matrix  i.e. Trans [j][i] = Mat[i][j]. Usually we want to parallelize the outermost loop in such nest. For correctness, there must not be a dependence between any two statements executed in different iterations of parallelized loop. In this example, we can safely parallelize the i loop because each iteration of the loop changes row of input matrix to corresponding column of the output matrix. In this example PARALLEL section, SHARED, PRIVATE clauses and DO directive are used.

  • Input
  • Row Size and Column Size of matrix

  • Output

    Result of Transpose of a Matrix , time taken for this and Mflop/s

Example 15 : Description for implementation of an OpenMP program to find solution of Matrix system of linear equations by Jacobi method.

  • BACKGROUND
  • The system of linear equations [A]{x} = {b} is given by

    a0,0 x 0 + a0,1 x1 + ...+ a0, n-1 xn-1 = b0 ,
    a1,0 x0 + a1,1 x1 + ...+ a1, n-1 x n-1 = b1 ,
    an-1,0 x0 + an-1,1 x1 + ...+ an-1, n-1 x n-1 = bn-1 .

    In matrix notation, this system of linear equation is written as [A]{x}={b} where A[i, j] = ai, j , b is an n x 1 vector [b0,b1,...,bn-1]T, and is the desired solution vector [x0,x1,...,xn-1]T. We will make all subsequent references to ai, j by A[i, j] and xi by x[i ].

    Iterative methods are techniques to solve systems of equations of the form [A]{x}={b} that generate a sequence of approximations to the solution vector x. In each iteration, coefficient matrix A is used to perform a matrix-vector multiplication. The number of iterations required to solve a system of equations with a desired precision is usually data dependent; hence, the number of iterations is not known prior to executing the algorithm. Therefore, we can analyze the performance and scalability of a single iteration of an iterative method. Iterative methods do not guarantee a solution for all systems of equations. However, when they do yield a solution, they are usually less expensive than direct methods for matrix factorization for large size of matrices.

  • Objective
  • Write an OpenMP program, for solving system of linear equations [A]{x} = {b} using Jacobi method.

  • Description
  • The Jacobi iterative method is one of the simplest iterative techniques. The ith equation of a system of linear equations [A]{x}={b} is

     

    If all the diagonal elements of A are nonzero (or are made nonzero by permuting the rows and columns of A), we can rewrite equation (1) 

    The Jacobi method starts with an initial guess x0 for the solution vector x. This initial vector x0 is used in the right-hand side of equation (2) to arrive at the next approximation x1 to the solution vector. The vector x1 is then used in the right hand side of equation (2), and the process continues until a close enough approximation to the actual solution is found. A typical iteration step in the Jacobi method is

    We now express the iteration step of equation 3 in terms of residual rk. Equation 3 can be rewritten as 

     

    If the values have been computed upto a certain accuracy the iterations are stopped otherwise the processes use these values in the next iterations to compute a new set of values.

  • Input
  • The input should be in following format. Assume that the real matrix is of size m x n and the real vector is of size n . Format for the input files are given below.

    Input file 1
    The input file for the matrix should strictly adhere to the following format.
    #Line 1 : Number of Rows (m), Number of columns (n).
    #Line 2 : (data) (in row-major order. This means that the data of second row follows that of the first and so on.)
    A sample input file for the matrix (8 x 8) is given below
    8 8
    61.0 2.0 3.0 4.0 6.0 8.0 9.0 2.0
    2.0 84.0 6.0 4.0 3.0 2.0 8.0 7.0
    3.0 6.0 68.0 2.0 4.0 3.0 9.0 1.0
    4.0 4.0 2.0 59.0 6.0 4.0 3.0 8.0
    6.0 3.0 4.0 6.0 93.0 8.0 3.0 1.0
    8.0 2.0 3.0 4.0 8.0 98.0 3.0 1.0
    9.0 8.0 9.0 3.0 3.0 3.0 85.0 7.0
    2.0 7.0 1.0 8.0 1.0 1.0 7.0 98.0


    Input file 2
    The input file for the vector should strictly adhere to the following format.
    #Line 1 : Size of the vector (n)
    #Line 2 : (data)
    A sample input file for the vector (8 x 1) is given below
    8
    95.0 116.0 96.0 90.0 124.0 127.0 127.0 125.0

  • Output
  • The results of the solution x of matrix system Ax = b.
    1.0 1.0 1.0 1.0 1.0 1.0 1.0 1.0

  Contents       Hands-on Session       OpenMP Library Calls       OpenMP on Web