1.2 Desriptions for Implementation of MPI Programs
Example 1: Description for implementation of MPI program to print Hello World

Description
Process with rank k ( k = 1, 2, ....., p1) will
send "Hello World" message to process with rank 0. Process
with rank 0 receives the message and prints it. MPI pointtopoint
communication library calls (MPI_Send and MPI_Recv) have been used
in this program.
None
Process with rank 0 prints the following message with each of the
other Rank Numbers other than rank 0 (i.e. rank = 1,2, ..., p1):
Hello World From Processrank
where rank = 1,2, ..., p1
Example 2 : Description for implementation of MPI program to find sum of n integers
using MPI pointtopoint blocking communication library calls

Description
Each process with rank greater than 0 sends the value of its
rank to the process with rank 0. Process with rank 0 receives
the values from each process and calculate the sum. Finally, process with
rank 0 prints the sum.

Input
For input data, let each process uses its identifying number, i.e. the
value of its rank. For example, process with rank 0 uses the number
0, process with rank 1 uses the number 1, etc.
Example
3 : Description for implementation of MPI program to find sum ofn integers on
parallel computer in which processors are arranged in linear array topology
using MPI pointtopoint blocking communication library calls

Objective
Write a MPI program to find sum of n values using p processors
of PARAM 10000. Assume that p processors are arranged in linear
array topology.

Description
In linear array interconnection network, each processor ( except
the processor at the end ) has a direct communication link to the immediate
next processor. A simple way of communicating a message between processors
is, by repeatedly passing message to the processor immediately to either
right or left , until it reaches its destination, i.e. last processor in
the linear array. A simple way to connect processors is illustrated
in Figure 1.
Figure 1 Processors are connected by Linear Array topology
All the processes are involved in communication. The processes with
rank k (k is greater than 0) receives the accumulated
or partial sum from the previous process with rank k1. Finally,
process with rank p1 prints the final sum.

Input
For input data, let each process use its identifying number, i.e. the
value of its rank. For example, process with rank 0 uses the number
0, process with rank 1 uses the number 1, etc.
Example
4 : Description for implementation of MPI program to find sum of n integers on parallel computer in which processor are arranged in ring topology using MPI pointtopoint blocking communication library calls

Objective
Write a MPI program to find sum of n values using p processors
of PARAM 10000. Assume that p processors are arranged in ring
topology.

Description
In linear array interconnection network ( refer Example
3 for more details on linear array topology ) with a wraparound
connection is called as a ring. A wraparound connection is often
provided between the processors at the end. A simple way of communicating
a message between processors is, by repeatedly passing message to the processor
immediately to either right or left depending on which direction
yield a shorter path, until it reaches its destination, i.e., first processor
in the ring.
All the processes are involved in communication. The process with rank
k (k is greater than 0) receives the accumulated or
partial sum from the previous process with rank k1. Process with
rank p1 sends the final sum to process with rank 0. Finally,
process with rank 0 prints the final sum.

Input
For input data, let each process use its identifying number, i.e. the
value of its rank. For example, process with rank 0 uses the number
0, process with rank 1 uses the number 1, etc.
Example
5 : Description for implementation of MPI program to find sum of n integers on parallel computer in which processors are arranged in binary tree topology (associative fanin rule) using MPI pointtopoint blocking communication library calls

Objective
Write a MPI program to find sum of n ( n = 2 ^{i };
i = 3) integers on p (p=n) processors of PARAM
10000 using associative fanin rule.

Description
The first step is to group the total number of processors in
ordered pairs such as (P_{0}, P_{1}), (P_{2},
P_{3}), (P_{4}, P_{5}), (P_{6},
P_{7}), .................., (P_{p2}, P_{p1}).
Then compute the partial sums in all pairs of processors using MPI pointtopoint
blocking communication and accumulate the partial sum on the processors
P_{0}, P_{2}, ....,P_{p2}. For example,
the processor P_{i} (i is even) computes
the partial sum for the pair (P_{i}, P_{i+}_{1})
by performing MPI pointtopoint blocking communication library calls.
Figure 2 Tree Structure for sum of n integers
by Associative fanin rule
In the second step, consider the pair of processors (P_{0},
P_{2}), (P_{4}, P_{6}), .................,
(P_{p}_{4}, P_{p}_{2})
obtain the new partial sum by considering the existing accumulated partial
sums on the processors P_{0}, P_{2}, P_{4}, ...,
P_{8} as explained in the previous step. This procedure is repeated
till two processors are left out and finally accumulate the global sum
on the processor P_{0}. In this example, MPI pointtopoint
blocking communication library calls such as, MPI_Send and MPI_Recv,
are used. An example of associative fanin rule is described as
a tree structure in the Figure 2 for n = 16 .

Input
For input data, let each process use its identifying number, i.e. the
value of its rank. For example, process with rank 0 uses the number
0, process with rank 1 uses the number 1, etc.
Example
6 : Description for implementation of MPI program to find sum of n integers on parallel computer in which processors are arranged in binary tree topology (associative fanin rule) using MPI pointtopoint nonblocking communication library calls.

Objective
Write a MPI program to find sum of n (n = 2 ^{i};
i = 3) integers on p (p=n) processors of PARAM
10000 using associative fanin rule.

Description
In this example MPI pointtopoint (non blocking) communication
library calls such as, MPI_Isend and MPI_Irecv, are used.
For more details refer Example
5.

Input
For input data, let each process use its identifying number, i.e. the
value of its rank. For example, process with rank 0 uses the number
0, process with rank 1 uses the number 1, etc.
Example
7 : Description for implementation of MPI program to compute the value of PI by Numerical Integration using MPI pointtopoint communication library calls

Objective
Write a MPI program to compute the value of pi function by numerical
integration of a function f(x) = 4/(1+x ^{2} )
between the limits 0 and 1.

Description
There are several approaches to parallelizing a serial program. One approach is to partition the data among the processes. That is we partition the interval of integration [0,1] among the processes, and each process estimates local integral over its own subinterval. The local calculations produced by the individual processes are combined to produce the final result. Each process sends its integral to process 0, which adds them and prints the result.
To perform this integration numerically, divide the interval from 0
to 1 into n subintervals and add up the areas of the rectangles
as shown in the Figure 3 (n = 5). Large values of n give
more accurate approximations of pi. Use MPI pointtopoint communication
library calls.
Figure 3 Numerical integration of pi function
We assume that n is total number of subintervals, p is the number of processes and p < n. One simple way to distribute the total number of subintervals to each process is to dividen by p. There are two kinds of mappings that balance the load. One is a block mapping, partitions the array elements into blocks of consecutive entries and assigns the block to the processes. The other mapping is a cyclic mapping. It assigns the first element to the first process, the second element to the second, and so on. If n > p, we get back to the first process, and repeat the assignment process for remaining elements. This process is repeated until all the elements are assigned. We have used a cyclic mapping for partition of interval [0,1] ontop processes.

Input
Process with rank 0 reads the input parameter n, the number
of intervals on command line.

Output
Process with rank 0 prints the computed value of pi function.
Example
8 : Description for implementation of MPI program to scatter n integers using MPI collective communication library calls

Objective
Write a MPI program to scatter an integer array A of dimension
n on p processors of PARAM 10000 using MPI_Scatter
communication library calls.

Description
Assume that n is multiple of p and index of the array
A starts from 0 (as in CProgram). Processor with rank k,
k = 0,1, ..., p1 gets n/p values starting from A[i],
i = k*n/p to (k+1)*n/p  1.
Figure 4 Processor p0 performing MPI_Scatter Collective
Communication Primitive

Input
Process with rank 0 reads the input data. You have to adhere
strictly the following format for the input file.
#Line 1 : Size of the array
#Line 2 : data in order, this means that the data of second
entry of the input array A follows the
first and so on.
24
2 10 3 4 23 14 4 6
8 32 63 86 12 8 3 9 13
4 14 16 18 2 9 86
Example
9 : Description for implementation of MPI program to gather n integers from p
processors and make the resultant
gathered data (np) available on every processor
using collective communication library calls

Objective
Write a MPI program to gather values from each processors and store
the output values in the global integer array A of dimension n
on p processors of PARAM 10000. Use MPI_Allgather
Communication library call in the program.

Description
Assume n be the dimension of an output array A and p
be the processors used. Also, the integer array A [i]
of size n/p with known n/p values where i varies from
0 to n/p  1 is given as additional input on processor with rank k ^{
}(k = 0,1, ... p1). Each processor reads respective
input array of size n/p from the given input file. You have
to adhere strictly the following format for the input file. Assume
that the index of the array A starts from 0 as in CPrograme.
Figure 5 MPI_Allgather Collective Communication Primitive

Input
Each process k reads the input file. You have to adhere strictly
the following format for the input file.
#Line 1 : Size of the local array
#Line 2 : data in order, this means that the data of second
entry of the input array A follows the
first and so on.
A sample input file is given below. User can create his own input files
in this format.
Input file for process 0
6
21 18 54 24 33 69
Example
10 : Description for implementation of MPI program to find sum of n integers
using MPI collective communication and computation library calls

Objective
Write a MPI program to find sum of n integers, on p processors
of PARAM 10000, using MPI collective communication and computation
library call.

Description
In this example MPI collective communication and computation
library call, MPI_Reduce is used.

Input
For input data, let each process use its identifying number, i.e. the
value of its rank. For example, process with rank 0 uses the number
0, process with rank 1 uses the number 1, etc.
Example 11 : Description for implementation of MPI program to compute value of PI
by Numerical Integration using
MPI collective communication library calls

Objective
Write a MPI program to compute the value of pi function by numerical
integration of a function f(x) = 4/(1+x ^{2} )
between the limits 0 and 1.

Description
In this example MPI collective communication and computation library
call, MPI_Reduce is used. For more details refer Example
7.

Input
Process with rank 0 reads the input parameter n, the number
of intervals on command line.
Example
12 : Description for implementation of MPI program to construct a communicator
consisting group of diagonal
processors in a square grid of processors using
MPI group library calls

Objective
Write a MPI program to create a diagonal communicator group of processors
in a square grid processors on PARAM 10000.

Description
Assume that the number of processors is a perfect square. The processors
in the processor grid are numbered in rowwise fashion. An example of a
square grid of processors (p = 9) is described in Figure 4. Diagonal
communicator consists of diagonal processors in the square processor grid.
Figure 6 Communicator consists of diagonal processors
in a 3 x 3 square processor grid
The processors (P_{0},P_{4},P_{8}) along the
diagonal form a diagonal communicator as shown in Figure 6. The ranks
of the processors in the diagonal communicator will be (0,1,2). Use special
( MPI_Comm_group, MPI_Group_incl, and MPI_Comm_create ) MPI
library calls.
Contents
Handson Session
MPI Library Calls
MPI on Web
