CS300 : Supercomputing for Engineering Applications 

________________________________________________________

OpenMP Information

________________________________________________________

 

2.1. An overview of OpenMP

2.1.1. Introduction  

Two parallel programming models commonly used are implicit, and explicit. A most popular approach of implicit parallelism is the automatic parallelization of sequential programs by compilers. The compiler performs dependence analysis on a sequential program source code and then uses a suite of program transformation techniques to convert the sequential code into native parallel Fortran code. Such compilers reduce the burden of programmer to explicitly parallelize the program. Three explicit parallel programming models are data-parallel, shared-variable and message passing. Explicit parallelism means that the programmer using special language constructs, compiler directives, or library function calls explicitly specifies parallelism in the source code. If the programmer does not explicitly specify parallelism, but lets the compiler and the run-time support system automatically exploit it, we have the implicit parallelism. Out of many explicit parallel programming models, the dominant ones are data-parallel, message passing, and shared-variable model.

Shared Memory Programming 

Shared-memory systems typically provide both static and dynamic process creation. That is, processes can be created at the beginning of program execution by a directive to the operating system, or they can be created during the execution of the program. The best-known dynamic process creation function is fork. A typical implementation will allow a process to start another, or child, process by a fork. Three processes typically manage coordinating among processes in shared memory programs. The starting, or a parent, process can wait for the termination of the child process by calling join. The second prevents processes from improperly accessing shared resources. The third provides a means for synchronizing the processes.

The shared-memory model is similar to the data-parallel model. It has a single address (global naming) space. It is similar to the message passing model in that it is multithreading and synchronous. However, data reside in a single, shared address space, thus does not have to be explicitly allocated. Workload can be either explicitly or implicitly allocated. Communication is done implicitly through shared reads and writes of variables. However, synchronization is explicit.

Shared variable programs are multi threaded and asynchronous, require explicit synchronizations to maintain correct execution order among the processes. Parallel programming based on the shared memory model has not progressed as much as message passing parallel programming. An indicator is the lack of a widely accepted standard such as MPI or PVM for message passing. The current situation is that shared-memory programs are written in a platform specific language for multiprocessors (mostly SMPs and PVPs). Such programs are not portable even among multiprocessors, not to mention multicomputers (MPPs and clusters). Three platform independent shared memory programming models are X3H5, Pthreads, and OpenMP.

OpenMP is an Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism. It is a specification for a set of compiler directives, library routines and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs. The OpenMP is a shared memory standard supported by a group of hardware and software vendors, such as DEC, Intel, IBM, Kuck & Associates, SGI, Portland Group, Numerical Algorithms Group, U.S DOE ASCI program, etc.  It is comprised of three primary API components:   

  • Compiler Directives 
  • Runtime Library Routines 
  • Environment Variables 

OpenMP is portable and the API is specified for C/C++ and Fortran. Multiple platforms have been implemented including most Unix platforms and Windows NT. It is jointly defined and endorsed by a group of major computer hardware and software vendors. It is not an ANSI standard yet. 

Goals

Standardization: Provide a standard among a variety of shared memory architectures/platforms 

Lean and Mean: Establish a simple and limited set of directives for programming shared memory machines. Significant parallelism can be implemented by using just 3 or 4 directives. 

Ease of Use
 

  • Provide capability to incrementally parallelize a serial program, unlike message-passing libraries which typically require an all or nothing approach 
  • Provide the capability to implement both coarse-grain and fine-grain parallelism portability
  • Supports Fortran (77, 90, and 95), C, and C++
  • Public forum for API and membership 
 
2.1.2. History of OpenMP 

In the early 90's, vendors of shared-memory machines supplied similar, directive-based, Fortran programming extensions to make use of the architecture and its advantages:

  • The user would augment a serial Fortran program with directives specifying which loops were to be parallelized 
  • The compiler would be responsible for automatically parallelizing such loops across the SMP processors implementations were all functionally similar, but were diverging (as usual) as different vendors used different methods and implementations based on their architecture and platform. 

First attempt at a standard was the draft for ANSI X3H5 in 1994. It was never adopted, largely due to waning interest as distributed memory machines became popular. The OpenMP standard specification started in the spring of 1997, taking over where ANSI X3H5 had left off, as newer shared memory machine architectures started to become prevalent. 

The OpenMP groups believes that pThread is not scalable, since it is targeted at low end Unix SMPs, not technical high performance computing. It does not even has fortran binding/ pThread is low level, because it uses the library approach, not compiler directive. The ibrary approach precludes compiler optimisations. pThread supports only thread parallelism, not fine grain parallelism. The concept of OpenMP is flexible to support coarse grain parallelism.

pThread does not support incremental parallelism well. Given a sequential somputing program, it is difficult for the user to parallelise it using pThreads. The user has to worry about many low level details, and pThreads do not naturally support loop level data parallelism.

OpenMP is designed to alleviate the short comings discussed above. For wide acceptance of OpenMP, the key to develop good compilers and run time environment.

 

 
2.1.3. Why OpenMP? 

Shared-memory parallel programming directives have never been standardized in the industry. An earlier standardization effort, ANSI X3H5 was never formally adopted. So vendors have each provided a different set of directives, very similar in syntax and semantics, and each used a unique comment or pragma notation for "portability". OpenMP consolidates these directive sets into a single syntax and semantics, and finally delivers the long- awaited promise of single source portability for shared-memory parallelism.

OpenMP also addresses the inability of previous shared-memory directive sets to deal with coarse grain parallelism. In the past, limited support for coarse grain work has led to developers thinking that shared memory parallel programming was inherently limited to fine-grain parallelism. This is not the case with OpenMP. Orphaned directives in OpenMP offer the features necessary to represent coarse-grained algorithms.

2.1.4. Key Features of OpenMP

OpenMP incorporates the concept of orphan directives, do not appear in the lexical extent of a parallel construct but lie in its dynamic (execution) extent. They are not lexically enclosed in the parallel construct of the main program, but they are in its dynamic execution path.

User parallelise the main program using one or more parallel directives, and use other directives to control execution in the parallel subroutines. This way, he could enable parallel execution of major portions of the program with small modification. This concept also facilitates the development and reuse of modular parallel programs. X3H5 does not support orphan directives.

Besides compiler directives, OpenMP provides a set of run-time library routines with associated environment variables. They are used to control and query the parallel execution environment, provide general-purpose lock functions, set execution mode, etc. For instance, OpenMP allows a throughput mode.

The system then dynamically sets the number of threads used to execute parallel regions. This can maximize the throughput performance of the system, probably at the expense of prolonging the elasped wall-clock time of one application. X3H5 and OpenMP have similar parallelism directives. Only the notations are slightly different. OpenMP includes a new MASTER directive. Only the master thread should execute the program.

OpenMP provides more flexible functionality to control the data environment than X3H5. For instance, OpenMP supports reduction by a REDUCTION(+ : sum) clause in a PARALLEL DO directive. A private copy of the reduction variable sum is created for each thread. The private copy is initialized to 0 according to the reduction operator +. Each thread computes a private result. At the end of PARALLEL DO, the reduction variable sum is updated to equal the result of combining the original value of the reduction variable sum with the private results using the operator +. Reduction operators other than + can be specified.

The clause DEFAULT(PRIVATE) directs all variables in a parallel region to be private, unless overwritten by other explicit SHARED clauses. There is also a DEFAULT (SHARED) clause to direct all variables as shared. Auto-scoping makes it unnecessary to explicitly enumerate all variables. This can save programmer's time and reduce errors.

OpenMP introduces an ATOMIC directive which allows the compiler to take advantage of the most efficient scheme to implement atomic updates to a variable. This is superior to mutually exclusive constructs such as critical regions and locks.

2.1.5. Basic OpenMP Library calls

Most commonly used OpenMP Run time Library Calls in FORTRAN/C -Language are explained below.

Syntax:     'FORTRAN' Call
                    'C' Call

OMP_SET_NUM_THREADS

SUBROUTINE OMP_SET_NUM_THREADS (scalar_integer_expression)
void omp_set_num_threads(int num_threads)

sets the number of threads to use in a team 

This subroutine sets the number of threads that will be used in the next parallel region. The dynamic threads mechanism modifies the effect of this routine. If enabled, specifies the maximum number of threads that can be used for any parallel region. If disabled, specifies exact number of threads to use until next call to this routine. This routine can only be called from the serial portions of the code. This call has precedence over the OMP_NUM_THREADS environment variable.

OMP_GET_NUM_THREAD

INTEGER FUNCTION OMP_GET_NUM_THREADS()
int omp_get_num_threads(void)

returns the number of threads in the currently executing parallel region.

This subroutine/function returns the number of threads that are currently in the team executing the parallel region from which it is called. If this call is made from a serial portion of the program, or a nested parallel region that is serialized, it will return 1. The default number of threads is implementation dependent.

OMP_GET_MAX_THREADS

INTEGER FUNCTION OMP_GET_MAX_THREADS()
int omp_get_max_threads(void)

returns the maximum value that can be returned by a call to the OMP_GET_NUM_THREADS function.

Generally reflects the number of threads as set by the OMP_NUM_THREADS environment variable or the OMP_SET_NUM_THREADS() library routine. This function can be called from both serial and parallel regions of code.   

OMP_GET_THREAD_NUM

INTEGER FUNCTION OMP_GET_THREAD_NUM()
int omp_get_thread_num(void)

returns the thread number within the team

This function returns the thread number of the thread, within the team, making this call. This function returns the thread number. This number will be between 0 and OMP_GET_NUM_THREADS-1. The master thread of the team is thread 0. If called from a nested parallel region, or a serial region, this function will return 0.   

OMP_GET_NUM_PROCS

INTEGER FUNCTION OMP_GET_NUM_PROCS()
int omp_get_num_procs(void)

returns the number of processors that are available to the program.  

OMP_IN_PARALLEL

LOGICAL FUNCTION OMP_IN_PARALLEL()
int omp_in_parallel(void)

returns .TRUE. for calls within a parallel region, .FALSE. otherwise.

This function/subroutine is called to determine if the section of code which is executing is parallel or not. For Fortran, this function returns .TRUE. if it is called from the dynamic extent of a region executing in parallel, and .FALSE. otherwise. For C/C++, it will return a non-zero integer if parallel, and zero otherwise.   

OMP_SET_DYNAMIC

SUBROUTINE OMP_SET_DYNAMIC(scalar_logical_expression)
void omp_set_dynamic(int dynamic_threads)

control the dynamic adjustment of the number of parallel threads.

This subroutine enables or disables dynamic adjustment (by the run time system) of the number of threads available for execution of parallel regions. For Fortran, if called with .TRUE. then the number of threads available for subsequent parallel regions can be adjusted automatically by the run-time environment. If called with .FALSE., dynamic adjustment is disabled. For C/C++, if dynamic_threads evaluates to non-zero, then the mechanism is enabled, otherwise it is disabled. The OMP_SET_DYNAMIC subroutine has precedence over the OMP_DYNAMIC environment variable. The default setting is implementation dependent. Must be called from a serial section of the program.   

OMP_GET_DYNAMIC

LOGICAL FUNCTION OMP_GET_DYNAMIC()
int omp_get_dynamic(void)

returns .TRUE. if dynamic threads is enabled, .FALSE. otherwise.

This function is used to determine if dynamic thread adjustment is enabled or not. For Fortran, this function returns .TRUE. if dynamic thread adjustment is enabled, and .FALSE. otherwise. For C/C++, non-zero will be returned if dynamic thread adjustment is enabled, and zero otherwise.   

OMP_SET_NESTED

SUBROUTINE OMP_SET_NESTED(scalar_logical_expression)
void omp_set_nested(int nested)

enable or disable nested parallelism.

This subroutine is used to enable or disable nested parallelism. For Fortran, calling this function with .FALSE. will disable nested parallelism, and calling with .TRUE. will enable it. For C/C++, if nested evaluates to non-zero, nested parallelism is enabled; otherwise it is disabled. The default is for nested parallelism to be disabled. This call has precedence over the OMP_NESTED environment variable.   

OMP_GET_NESTED

LOGICAL FUNCTION OMP_GET_NESTED
void omp_get_nested

returns .TRUE. if nested parallelism is enabled, .FALSE. otherwise.

This function/subroutine is used to determine if nested parallelism is enabled or not. For Fortran, this function returns .TRUE. if nested parallelism is enabled, and .FALSE. otherwise. For C/C++, non-zero will be returned if nested parallelism is enabled, and zero otherwise.   

OMP_INIT_LOCK

SUBROUTINE OMP_INIT_LOCK(var)
void omp_init_lock(omp_lock_t *lock)
void omp_nest_init_lock(omp_nest_lock_t *lock)

allocate and initialise the lock

This subroutine / function initializes a lock associated with the lock variable. The initial state is unlocked.   

OMP_DESTROY_LOCK

SUBROUTINE OMP_DESTROY_LOCK(var)
void omp_destroy_lock(omp_lock_t *lock)
void omp_destroy_nest_lock(omp_nest_lock_t *lock)

deallocate and free the lock

This subroutine/function disassociates the given lock variable from any locks. It is illegal to call this routine with a lock variable that is not initialized.   

OMP_SET_LOCK

SUBROUTINE OMP_SET_LOCK(var)
void omp_set_lock(omp_lock_t *lock)
void omp_set_nest__lock(omp_nest_lock_t *lock)

acquire the lock, waiting until it becomes available, if necessary.

This subroutine forces the executing thread to wait until the specified lock is available. A thread is granted ownership of a lock when it becomes available. It is illegal to call this routine with a lock variable that is not initialized.   

OMP_UNSET_LOCK

SUBROUTINE OMP_UNSET_LOCK(var)
void omp_unset_lock(omp_lock_t *lock)
void omp_unset_nest__lock(omp_nest_lock_t *lock)

release the lock, resuming a waiting thread if any.

This subroutine releases the lock from the executing subroutine. It is illegal to call this routine with a lock variable that is not initialized.   

OMP_TEST_LOCK

SUBROUTINE OMP_TEST_LOCK(var)
void omp_test_lock(omp_lock_t *lock)
void omp_test_nest__lock(omp_nest_lock_t *lock)

try to acquire the lock, return success or failure

This subroutine attempts to set a lock, but does not block if the lock is unavailable. For Fortran, .TRUE. is returned if the lock was set successfully, otherwise .FALSE. is returned. For C/C++, non-zero is returned if the lock was set successfully, otherwise zero is returned. It is illegal to call this routine with a lock variable that is not initialized.

2.1.6. Compilation, Linking and Execution of OpenMP Program

Using command line arguments  

The compilation and execution details of an OpenMP program may vary on different computers. Depending upon your language preference, use following command to compile the program with Fortran90. 

# f90  <program name>  -openmp  -o  <name of executable> 

For example to compile a simple Hello World program user can type 

# f90 Omp_Hello_World.f -openmp -o HelloWorld (Fortran codes) 

# cc Omp_Hello_World.c -xopenmp -o HelloWorld (C codes) 
 

Using Makefile 

For more control over the process of compiling and linking programs for OpenMP, you should use a 'Makefile'. You may also use some commands in Makefile particularly for programs contained in a large number of files.  The user has to specify the names of the program and appropriate paths to link some of the libraries required for OpenMP in the Makefile

To compile and link an OpenMP program in C or Fortran, you can use the command  make

For the Hands-On: the application user can use Makefile_Fortran for Fortran programs, Makefile_C for C programs. 

 

Execution of program  

To execute an OpenMP program simply type the name of executable on command line. 

< Name of executable> 

For example to execute a simple Hello World program user can type  HelloWorld 

The output should look similar to below using three threads. The actual number of threads and order of output may vary. 

Hello World from thread = 0 
Number of threads = 3 
Hello World from thread = 2 
Hello World from thread = 1 

Depending upon your shell, set the number of threads using OMP_NUM_THREADS environment variable. 
  setenv OMP_NUM_THREADS  4      (csh/tcsh)   

   export OMP_NUM_THREADS=4      (bash/ksh) 

   

2.1.7. Compilation,Linking and Execution of OpenMP programs on PARAM 10000  

OpenMP C and Fortran programs require SUN Workshop 6.2 for compilation and execution on one node. i.e. SUN Ultra Sparc symmetric Multiprocessing processor of SUNFIRE 6800. The OpenMP program can not executed on multiple nodes of SUNFIRE 6800.

To compile an OpenMP program user can type the following on command line

 For C programs

/usr/local/WS/6.2/SUNWspro/bin/cc -xopenmp -xO3 -o <executable name> <program name>

For FORTRAN programs

/usr/local/WS/6.2/SUNWspro/bin/f90 -openmp -O3 -o <executable name> <program name>

To execute an OpenMP program simply type the name of executable on command line. 

 < Name of executable> 

For example to execute a simple Hello World program user can type  

HelloWorld 

 

2.1.8. Example Program in Fortran/C language

          Simple OpenMP Program "Omp_Hello_World.f"

The simple OpenMP program is "Hello World" program, in which each thread simply prints the message "Hello  World". In this example, thread with identifier 0, 1, 2, ......, p-1 will print the message "Hello World".

The simple OpenMP program in Fortran language in which each thread prints "Hello World"  message is explained below. We describe the features of the entire program and describe the program in details. 

First few lines of the program explain variable definitions, and constants. Followed by these declarations, OpenMP library calls are declared in the program. The library call OMP_GET_THREAD_NUM returns ThreadID the identifier of each thread and the library call OMP_GET_NUM_THREADS returns NoofThreads the total number of threads that the user has started for this program.

The following fragment of the program explains these features. The description of program is as follows:

program HelloWorld
integer NoofThreads, ThreadID, OMP_GET_NUM_THREADS
integer OMP_GET_THREAD_NUM

ThreadID is the identifier of each thread and NoofThreads is total number of threads used in the program. ThreadID and NoofThreads are private to each thread. Each thread obtains its own identifier and then prints the message "Hello World" in parallel.

Starting of OpenMP PARALLEL directive and PRIVATE clause. The PARALLEL directive pair specifies that the enclosed block of program, referred to as PARALLEL region, be executed in parallel by multiple threads. The PRIVATE clause is typically used to identify variables that are used as scratch storage in the program segment within the parallel region. It provides a list of variables and specifies that each thread have a private copy of those variables for the duration of the parallel region.

C$OMP  PARALLEL PRIVATE(NoofThreads, ThreadID)

Each thread gets its own copies of variables, identifier and prints it.

ThreadID = OMP_GET_THREAD_NUM()
print *, 'Hello World from thread = ', ThreadID

Only master thraed obtains NoofThreads the total number of threads used in program and prints it.

if (ThreadID .EQ. 0) then
        NoofThreads = OMP_GET_NUM_THREADS()
        print *, 'Number of threads = ', NoofThreads
 end if

Ending of OpenMP PARALLEL directive. All threads join master thread and disband.

C$OMP  END PARALLEL

stop
end

 

The simple OpenMP program in C language in which each thread prints "Hello World"  message is as below.

#include <stdio.h>
#include <omp.h>

/* Main Program */
main()
{

int ThreadID, NoofThreads;


/* OpenMP Parallel Directive */

#pragma omp parallel private(ThreadID)
{

ThreadID = omp_get_thread_num();
printf("\nHello World is being printed by the thread id %d\n", ThreadID);

/* Master Thread Has Its ThreadID 0 */

if (ThreadID == 0) {

printf("\nMaster prints Numof threads \n");
NoofThreads = omp_get_num_threads();
printf("Total number of threads are %d\n", NoofThreads);

}

}

 

2.1.9. List of Tools available in OpenMP

Performance is a critical issue in current cluster of workstations.  Performance evaluation and visualization is an important and useful technique that helps the user to understand the behavior of a parallel program and improve complex parallel performance phenomena. 

There are many problems associated with debugging an OpenMP program. It's difficult to understand the flow of your program's execution, and accessing variables is complicated because of the nature of private and shared variables in OpenMP program.

There are many tools available for debugging and performance visualization of an OpenMP programs. Some of them are listed below :

KAP/Pro Toolset for OpenMP

The KAP/Pro Toolset consists of

  • Guide : OpenMP compiler,

  • GuideView : a performance analysis tool for OpenMP; and

  • Assure : OpenMP Analyzer, a tool for verifying the correctness of OpenMP applications.

The Guide OpenMP Compiler provides OpenMP directive-based application development for Fortran, C, and C++. The GuideView Performance Analyzer presents a window into the performance details of a program's parallel execution. The Assure OpenMP Analyzer is the industry's first parallel correctness verifier.

They are available at

http://developer.intel.com/software/products/threadtool.htm

With the KAP/Pro tools and OpenMP, the application developer can easily and quickly develop, debug and tune programs for Windows NT and Unix. The Toolset provides a major advance in usability over alternative systems.


Sun's Compilers and Tools for OpenMP

http://www.cs.uh.edu/wompat2000/SLIDES/itzkowitz.pdf

TotalView

TotalView is the debugger for complex code. TotalView is far and away the best choice for those working with parallelism or large amounts of data because it scales transparently to support the big code and data sets running on anywhere from one to thousands of processes or processors. It's been proven in the world's toughest debugging environments.

It is available at

http://www.etnus.com/Products/TotalView

TotalView's support for OpenMP debugging lets you view the state of your program as if it were a non-parallel code. With TotalView, you can

  • Debug threaded codes whether OpenMP directives are present or not.

  • Understanding OpenMP code execution

  • Access private and shared variables as well as threadprivate variables.


On SGI system, OpenMP is supported by the standard development tools

ProDev Workshop (including ProMPF) for IRIX system.

2.1.10. OpenMP Information on the Web

There are large number of resources are available on the Internet. Following is a just  pointer to  few of them.  

The official site for OpenMP - Simple, Portable Scalable, SMP Programming 

  http://www.openmp.org

The OpenMP Tutorial

  http://hpcf.nersc.gov/training/tutorials/openmp

Parallel Programming with OpenMP

  http://oscinfo.osc.edu/training/openmp/big/fsld.002.html

The Usenet Newsgroup for OpenMP from Google Web Site

  comp.parallel   comp.lang.fortran

2.1.11. Reference Books on OpenMP

[1]  Rohit Chandra, Leonardo Dagum,   parallel programming in OpenMP,  Morgan Kaufmann Publishers, SanFrancisco, CA. 
            http://www.mkp.com 

[2] "OpenMP C and C++ Application Program Interface, Version 1.0". OpenMP Architecture Review Board. October 1998. 

[3] "OpenMP Fortran Application Program Interface, Version 1.0". OpenMP   Architecture Review Board. October 1997. 

[4]  "OpenMP". Workshop presentation. John Engle, Lawrence Livermore National   Laboratory. October, 1998. 

[5] "OpenMP". Alliance 98 Tutorial. Faisel Saied, NCSA 

[6] "Introduction to OpenMP Using the KAP/PRO Toolset". Kuck & Associates, Inc. 

[7] "Guide Reference Manual (C/C++ Edition, Version 3.6". Kuck & Associates, Inc. 

[8] "Guide Reference Manual (Fortran Edition, Version 3.6". Kuck & Associates,   Inc. 

2.1.12. OpenMP FAQ 

The OpenMP FAQ (frequently asked questions) list is available at  http://www.openmp.org

Q1: What is OpenMP?
Q2: What does the MP in OpenMP stand for?
Q3: How does OpenMP compare with ... ?
Q4: What languages does OpenMP work with?
Q5: Is OpenMP scalable?
Q6: Can I execute OpenMP program on 2 nodes of PARAM 10000?

Q1:What is OpenMP?
A1:OpenMP is a specification for a set of compiler directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs.

Q2: What does the MP in OpenMP stand for?
A2: The MP in OpenMP stands for Multi Processing. We provide Open specifications for Multi Processing via collaborative work with interested parties from the hardware and software industry, government and academia.

Q3:How does OpenMP compare with ... ?
A3: MPI? Message-passing has become accepted as a portable style of parallel programming, but has several significant weaknesses that limit its effectiveness and scalability. Message-passing in general is difficult to program and doesn't support incremental parallelization of an existing sequential program. Message-passing was initially defined for client/server applications running across a network, and so includes costly semantics (including message queuing and selection and the assumption of wholly separate memories) that are often not required by tightly-coded scientific applications running on modern scalable systems with globally addressable and cache coherent distributed memories.

HPF? HPF has never really gained wide acceptance among parallel application developers or hardware vendors. Some applications written in HPF perform well, but others find that limitations resulting from the HPF language itself or the compiler implementations lead to disappointing performance. HPF's focus on data parallelism has also limited its appeal.

Pthreads? Pthreads have never been targeted toward the technical/HPC market. This is reflected in the minimal Fortran support, and its lack of support for data parallelism. Even for C applications, pthreads requires programming at a level lower than most technical developers would prefer.

FORALL loops? FORALL loops are not rich or general enough to use as a complete parallel programming model. Their focus on loops and the rule that subroutines called by those loops can't have side effects effectively limit their scalability. FORALL loops are useful for providing information to automatic parallelizing compilers and preprocessors.

BSP or LINDA or SISAL or...? There are lots of parallel programming languages being researched or prototyped in the industry. These may be targeted towards a specific architecture, or focused on exploring one key requirement. If you have a question about how OpenMP compares with a specific language or model, we can help you figure this out.

Q4: What languages does OpenMP work with?
A4: OpenMP is designed for Fortran, C and C++ to support the language that the underlying compiler supports. The OpenMP specification does not introduce any constructs that require specific Fortran 90 or C++ features. OpenMP cannot be supported by compilers that do not support one of Fortran 77, Fortran 90, ANSI 89 C or ANSI C++.

Q5: Is OpenMP scalable?
A5: OpenMP can deliver scalability for applications using shared- memory parallel programming. Significant effort was spent to ensure that OpenMP can be used for scalable applications. Ultimately, scalability is a property of the application and the algorithms used. The parallel programming language can only support the scalability by providing constructs that simplify the specification of the the parallelism and can be implemented with low overhead by compiler vendors. OpenMP certainly delivers these kinds of constructs.

Q6: Can I execute OpenMP program on 2 nodes of PARAM 10000?
A6: No you can not execute OpenMP programs on 2 nodes of PARAM 10000. OpenMP is an Application Program Interface (API) that may be used to explicitly direct multi-threaded, shared memory parallelism. It is a specification for a set of compiler directives, library routines and environment variables that can be used to specify shared memory parallelism.

 
 Contents