Paralogic Beowulf Performance Suite V 1.3-1
December 6, 2002
Doug Eadline deadline@plogic.com
www.plogic.com/bps

Purpose: 
========

This package is a collection of performance analysis programs for use 
with Beowulf clusters. The suite itself provides a graphical user 
interface for running the programs as well as html file generation of output.


Quick Start:
============


1) Install the rpm - "rpm -ivh <rpmfile>" 
   If the rpm fails dependencies, use the source rpm.
   (See below for more information.)
2) Either use the Paralogic module facility or make sure
   sure your MPI iand compiler paths are set correctly. 
   You will need MPICH_HOME set to your MPICH path,
   LAM_HOME set to your LAM-MPI path, and MPIPRO_HOME set to
   your MPI-PRO path. Also, if you wish to 
   use LAM-MPI, you will need the LAM's bin path in your PATH 
   so that LAM can start on the nodes.
3) Run xbps  -  xbps &


Important Notes:
================

The bps suite is best run as a user. Some of the tests (i.e. NAS parallel)
will not run as root.

Not all features of the command line interface are possible with the GUI.

When using Netpipe/Netperf Benchmarks, rsh with no password must be 
permitted between the nodes upon which the benchmark is to be run.
This behavior is typical of most clusters. 

Under normal operation, xbps will always preserve the existing log directory.
This feature is to ensure previous results will not be overwritten. You can
copy previous log files (from log directories) into the current log directory
for bps-html conversion.

Also, the tests have been designed so that the bps rpm only needs to be
installed on the head node. For this to work, the bps log directory must me  
mounted on all nodes (i.e. under /home). 
 
When using the NAS Parallel Benchmarks it is advisable to use the MPI's 
which Paralogic uses for their benchmarking. However, rather than limit 
potential BPS users, these are not made a part of the required packages list.
The benchmark scripts have been written to rely on the two environment 
variables (for LAM-MPI and MPICH). If you are having problems with the
NAS benchmarks, extract the npb.tar.gz archive in the /usr/bps/src directory
and try running the scripts by hand. Consult the README.plogic file for 
more information. Also, if you wish to use the Portland Group or
the Intel Compilers make sure you have these properly configured.

Any suggestions for methods of improving the the tests are welcomed. 
Please email the BPS mailing list:  bps@plogic.com


Install Procedure:
==================

Using the rpm file:  
(version numbers may vary)

  rpm -i bps-1.2-7.i386.rpm

Using the source rpm file:
(Do this only if the rpm does not install on your system)

  rpm -i bps-1.2-7.src.rpm  (install src rpm)

  rpm -bb bps.spec  (build the rpm)

  rpm -i /usr/src/redhat/RPMS/i386/bps-1.2-7.i386.rpm  (install the rpm)


Using the tarball:

 tar -xvzf <bps tarball>.tar.gz
 cd <bps dir>
 sh build-all

This will put all important files in ~bps/bin and ~bps/src. 


Usage:
======

xbps 
	run bps in graphical mode. this mode is a bit easier to use than
        the command line mode. 

bps
	run benchmarks included in bps from command line 

  Options:
    -b                            bonnie++
    -s                            stream
    -f <send node>,<receive node> netperf to remote node
    -p <send node>,<receive node> netpipe to remote node
    -n <compiler>,<#processors),  NAS parallel benchmarks
     <test size>,<MPI>,           compiler={gnu,pgi,intel}
     <machine1,machine2,...>      test size={A,B,C,dummy}
                   		  MPI={mpich,lam,mpipro}
    -k                            keep NAS directory when finished
    -u                            unixbench
    -m                            lmbench
    -l <log_dir>                  benchmark log directory
    -w                            preserve existing log directory
    -i <mboard manufacturer>,     machine information
       <mboard model>,<memory>
       <interconnect>,<linux ver>
    -v                            show version
    -h                            show this help

bps-html <log directory>

	generate html output files based on files in <log directory> 		


In Case of Problems:
====================

The BPS suite is a collection of many tests. You should have minimal or
no problems with the single machine tests. As more machines are involved
the tests, there is room for more configuration errors to arise.

If a test does not run the best thing to do is to check the "test_name.log"
file in the log directory. In the case of the NAS tests, the results are
in the form npb.COMPILER.MPI.CLASS.PROCESSORS.  In general, if you are
problems with a test it may be best to run it from the command line. In the
case of the NAS suite, the "-k" option will keep the npb directory
in the log directory so you can run the tests more directly by using 
the "run_suite" script in the npb directory. Also the README.plogic
file in the npb directly should provide more information on how the tests
are run and how to resolve possible problems.


Background:
===========

General:
http://www.plogic.com/bps

bonnie++ - hard drive performance
Reference: http://www.coker.com.au/bonnie++/

stream - memory performance
Reference: http://www.cs.virginia.edu/stream/

netperf - general network performance
Reference: http://www.netperf.org/netperf/NetperfPage.html

netpipe - detailed network performance
Reference: http://www.scl.ameslab.gov/Projects/ClusterCookbook/nprun.html

unixbench - general Unix benchmarks
Reference: http://www.linuxdoc.org/HOWTO/Benchmarking-HOWTO.html#toc3

LMbench - low level benchmarks
Reference: http://www.bitmover.com/lmbench/

NAS - parallel tests
Reference: http://www.nas.nasa.gov/Software/NPB/

The following is a description of the NAS tests.

BT is a simulated CFD application that uses an implicit
  algorithm to solve 3dimensional (3D) compressible NavierStokes
  equations. The finite differences solution to the problem
  is based on an Alternating Direction Implicit (ADI) approximate
  factorization that decouples the x, y, and z dimensions.
  The resulting systems are BlockTridiagona/l of 5x5 blocks
  and are solved sequentially along each dimension.

SP is a simulated CFD application that has a similar structure
  to BT. The finite differences solution to the problem
  is based on a Beam Warming approximate factorization that
  decouples the x, y, and z dimensions. The resulting system
  has scalar Pentadiagonal bands of linear equations that
  are solved sequentially along each dimension.

LU is a simulated CFD application that uses symmetric successive
  over relaxation (SSOR) method to solve a seven block diagonal
  system resulting from finite difference discretization
  of the NavierStokes equations in 3D by splitting to into
  block Lower and Upper triangular systems.

FT contains the computational kernel of a 3D fast Fourier
  Transform (FFT)based spectral method. FT performs three
  one dimensional (1D) FFT's, one for each dimension.

MG uses a Vcycle MultiGrid method to compute the solution
  of the 3D scalar Poisson equation. The algorithm works
  continuously on a set of grids that are made between coarse
  and fine. It tests both short and long distance data movement.

CG uses a Conjugate Gradient method to compute an approximation
  to the smallest eigenvalue of a large, sparse, unstructured
  matrix. This kernel tests unstructured grid computations
  and communications by using a matrix with randomly generated
  locations of entries.

EP is an Embarrassingly Parallel benchmark. It generates
  pairs of Gaussian random deviates according to a specific
  scheme. The goal is to establish the reference point for
  peak performance of a given platform.

