________________________________________________________
 

Descriptions for Implementation of MPI Programs
________________________________________________________
 

 


1.2 Desriptions for Implementation of MPI Programs

Example 1: Description for implementation of MPI program to print Hello World

  • Objective
  • Write a MPI program to print "Hello World". 

  • Description
  • Process with rank k ( k = 1, 2, ....., p-1) will send "Hello World" message to process with rank 0. Process with rank 0 receives the message and prints it. MPI point-to-point communication library calls (MPI_Send and MPI_Recv) have been used in this program.

  • Input

None

  • Output

Process with rank 0 prints the following message with each of the other Rank Numbers other than rank 0 (i.e. rank = 1,2, ..., p-1):
Hello World From Processrank where rank = 1,2, ..., p-1

     

Example 2 : Description for implementation of MPI program to find sum of n integers
                             using MPI point-to-point blocking communication library calls
 
  • Objective
  • Write a MPI program to find sum of n integers on p processors of PARAM 10000. 

  • Description
  • Each process with rank greater than 0 sends the value of its rank to the process with rank 0. Process with rank 0 receives the values from each process and calculate the sum. Finally, process with rank 0 prints the sum. 

  • Input
  • For input data, let each process uses its identifying number, i.e. the value of its rank. For example, process with rank 0 uses the number 0, process with rank 1 uses the number 1, etc. 

  • Output
  • Process with rank 0 prints the final sum.

Example 3 : Description for implementation of MPI program to find sum ofn integers on
                              parallel computer in which processors are arranged in linear array topology
                              using MPI point-to-point blocking communication library calls
  • Objective
  • Write a MPI program to find sum of n values using p processors of PARAM 10000. Assume that p processors are arranged in linear array topology.

  • Description
  • In linear array interconnection network, each processor ( except the processor at the end ) has a direct communication link to the immediate next processor. A simple way of communicating a message between processors is, by repeatedly passing message to the processor immediately to either right or left , until it reaches its destination, i.e. last processor in the linear array. A simple way to connect processors is illustrated in Figure 1.

Figure 1 Processors are connected by Linear Array topology 
    All the processes are involved in communication. The processes with rank k (k is greater than 0) receives the accumulated or partial sum from the previous process with rank k-1. Finally, process with rank p-1 prints the final sum. 
  • Input
  • For input data, let each process use its identifying number, i.e. the value of its rank. For example, process with rank 0 uses the number 0, process with rank 1 uses the number 1, etc. 

  • Output
  • Process with rank p-1 prints the final sum. 

Example 4 : Description for implementation of MPI program to find sum of n integers on
                            parallel computer in which processor are arranged in ring topology using MPI
                            point-to-point blocking communication library calls
  • Objective
  • Write a MPI program to find sum of n values using p processors of PARAM 10000. Assume that p processors are arranged in ring topology.

  • Description
  • In linear array interconnection network ( refer Example 3 for more details on linear array topology ) with a wraparound connection is called as a ring. A wraparound connection is often provided between the processors at the end.  A simple way of communicating a message between processors is, by repeatedly passing message to the processor immediately to either right or left  depending on which direction yield a shorter path, until it reaches its destination, i.e., first processor in the ring

    All the processes are involved in communication. The process with rank k (k is greater than 0) receives the accumulated or partial sum from the previous process with rank k-1. Process with rank p-1 sends the final sum to process with rank 0. Finally, process with rank 0 prints the final sum. 
  • Input
  • For input data, let each process use its identifying number, i.e. the value of its rank. For example, process with rank 0 uses the number 0, process with rank 1 uses the number 1, etc. 

  • Output
  • Process with rank 0 prints the final sum. 

Example 5 : Description for implementation of MPI program to find sum of n integers on
                            parallel computer in which processors are arranged in binary tree topology
                            (associative fan-in rule) using MPI point-to-point blocking communication
                            library calls
  • Objective
  • Write a MPI program to find sum of n ( n = 2 i ; i = 3) integers on p (p=n) processors of PARAM 10000 using associative fan-in rule. 

  • Description 
  • The first step is to group the total number of processors in ordered pairs such as (P0, P1), (P2, P3), (P4, P5), (P6, P7), .................., (Pp-2, Pp-1). Then compute the partial sums in all pairs of processors using MPI point-to-point blocking communication and accumulate the partial sum on the processors P0, P2, ....,Pp-2. For example, the processor Pi (i is even) computes the partial sum for the pair (Pi, Pi+1) by performing MPI point-to-point blocking communication library calls. 

    Figure 2  Tree Structure for sum of n integers by Associative fan-in rule

     

    In the second step, consider the pair of processors (P0, P2), (P4, P6), ................., (Pp-4, Pp-2) obtain the new partial sum by considering the existing accumulated partial sums on the processors P0, P2, P4, ..., P8 as explained in the previous step. This procedure is repeated till two processors are left out and finally accumulate the global sum on the processor P0. In this example, MPI point-to-point blocking communication library calls such as, MPI_Send and MPI_Recv, are used. An example of associative fan-in rule is described as a tree structure in the Figure 2 for n = 16 . 

  • Input
  • For input data, let each process use its identifying number, i.e. the value of its rank. For example, process with rank 0 uses the number 0, process with rank 1 uses the number 1, etc. 

  • Output
  • Process with rank 0 prints the final sum. 

Example 6 : Description for implementation of MPI program to find sum of n integers on
                             parallel computer in which processors are arranged in binary tree topology
                            (associative fan-in rule) using MPI point-to-point non-blocking
                             communication library calls.
  • Objective
  • Write a MPI program to find sum of n (n = 2 i; i = 3) integers on p (p=n) processors of PARAM 10000 using associative fan-in rule. 

  • Description
  • In this example MPI point-to-point (non blocking) communication library calls such as, MPI_Isend and MPI_Irecv, are used. For more details refer Example 5

  • Input
  • For input data, let each process use its identifying number, i.e. the value of its rank. For example, process with rank 0 uses the number 0, process with rank 1 uses the number 1, etc. 

  • Output
  • Process with rank 0 prints the final sum. 

Example 7 : Description for implementation of MPI program to compute the value of PI
                             by Numerical Integration using MPI point-to-point communication library calls
  • Objective
  • Write a MPI program to compute the value of pi function by numerical integration of a function f(x) = 4/(1+x 2 ) between the limits 0 and 1. 

  • Description
  • There are several approaches to parallelizing a serial program. One approach is to partition the data among the processes. That is we partition the interval of integration [0,1] among the processes, and each process estimates local integral over its own subinterval. The local calculations produced by the individual processes are combined to produce the final result. Each process sends its integral to process 0, which adds them and prints the result.

    To perform this integration numerically, divide the interval from 0 to 1 into n subintervals and add up the areas of the rectangles as shown in the Figure 3 (n = 5). Large values of n give more accurate approximations of pi. Use MPI point-to-point communication library calls. 

    Figure 3 Numerical integration of pi function

    We assume that n is total number of subintervals, p is the number of processes and p < n. One simple way to distribute the total number of subintervals to each process is to dividen by p. There are two kinds of mappings that balance the load. One is a block mapping, partitions the array elements into blocks of consecutive entries and assigns the block to the processes. The other mapping is a cyclic mapping. It assigns the first element to the first process, the second element to the second, and so on. If n > p, we get back to the first process, and repeat the assignment process for remaining elements. This process is repeated until all the elements are assigned. We have used a cyclic mapping for partition of interval [0,1] ontop processes.

  • Input
  • Process with rank 0 reads the input parameter n, the number of intervals on command line. 

  • Output
  • Process with rank 0 prints the computed value of pi function. 

Example 8 : Description for implementation of MPI program to scatter n integers using
                             MPI collective communication library calls
  • Objective
  • Write a MPI program to scatter an integer array A of dimension n on p processors of PARAM 10000 using MPI_Scatter  communication library calls. 

  • Description
  • Assume that n is multiple of p and index of the array A starts from 0 (as in C-Program).  Processor with rank k, k = 0,1, ..., p-1 gets n/p values starting from A[i], i = k*n/p to  (k+1)*n/p - 1

                                        

              Figure 4   Processor p0 performing MPI_Scatter Collective Communication Primitive 

  • Input 
  • Process with rank 0 reads the input data. You have to adhere strictly the following format for the input file. 

    #Line 1 : Size of the array 
    #Line 2 : data in order, this means that the data of second entry of the input array A follows the 
                   first and so on. 

    24 
    2  10  3  4  23  14  4  6  8  32  63  86  12  8  3  9  13  4  14  16  18  2  9  86 
     

  • Output 
  • Each process prints its own n/p elements of array A. 
     

Example 9 : Description for implementation of MPI program to gather n integers from p
                             processors and make the resultant  gathered data (np) available on every processor
                             using collective communication library calls
  • Objective
  • Write a MPI program to gather values from each processors and store the output values in the global integer array A of dimension n on p processors of PARAM 10000. Use MPI_Allgather  Communication library call in the program 

  • Description
  • Assume n be the dimension of an output array A and p be the processors used.  Also, the integer array A [i] of size n/p with known n/p values where i varies from 0 to n/p - 1 is given as additional input on processor with rank (k = 0,1, ... p-1). Each processor reads respective input array of size n/p from the given input file. You have to adhere strictly the following format for the input file.  Assume that the index of the array A starts from 0 as in C-Programe. 

                                       

                                        Figure 5  MPI_Allgather Collective Communication Primitive

  • Input  
  • Each process k reads the input file. You have to adhere strictly the following format for the input file. 

    #Line 1 : Size of the local array 
    #Line 2 : data in order, this means that the data of second entry of the input array A follows the 
                   first and so on. 

    A sample input file is given below. User can create his own input files in this format. 

    Input file for process 0 

    21 18 54 24 33 69 
     

  • Output 
  • Each process prints the output array A. 
     

Example 10 : Description for implementation of MPI program to find sum of n integers
                                using MPI collective communication and computation library calls
  • Objective
  • Write a MPI program to find sum of n integers, on p processors of PARAM 10000, using MPI collective communication and computation library call. 

  • Description
  • In this example MPI collective communication and computation library call, MPI_Reduce is used.

  • Input
  • For input data, let each process use its identifying number, i.e. the value of its rank. For example, process with rank 0 uses the number 0, process with rank 1 uses the number 1, etc. 

  • Output
  • Process with rank 0 prints the final sum. 

Example 11 :  Description for implementation of MPI program to compute value of PI
                                 by Numerical Integration using  MPI collective communication library calls

 
  • Objective
  • Write a MPI program to compute the value of pi function by numerical integration of a function f(x) = 4/(1+x 2 ) between the limits 0 and 1. 

  • Description
  • In this example MPI collective communication and computation library call, MPI_Reduce is used. For more details refer Example 7

  • Input
  • Process with rank 0 reads the input parameter n, the number of intervals on command line. 

  • Output
  • Process with rank 0 prints the computed value of pi function. 

Example 12 : Description for implementation of MPI program to construct a communicator
                               consisting group of diagonal  processors in a square grid of processors using
                               MPI group library calls
  • Objective
  • Write a MPI program to create a diagonal communicator group of processors in a square grid processors on PARAM 10000. 

  • Description
  • Assume that the number of processors is a perfect square. The processors in the processor grid are numbered in row-wise fashion. An example of a square grid of processors (p = 9) is described in Figure 4. Diagonal communicator consists of diagonal processors in the square processor grid. 

     
           Figure 6   Communicator consists of diagonal processors in a 3 x 3 square processor grid  
                        
    The processors (P0,P4,P8) along the diagonal form a diagonal communicator as shown in Figure 6. The ranks of the processors in the diagonal communicator will be (0,1,2). Use special ( MPI_Comm_group, MPI_Group_incl, and MPI_Comm_create ) MPI library calls. 
 
 
Contents       Hands-on Session       MPI Library Calls       MPI on Web