These instructions are for Fedora Core 13. Infiniband is a relatively complicated network stack, and the people who maintain the drivers (the OpenFabrics Alliance) are slow to release drivers for newer linux versions. This effectively limits us to using either an ancient, explicitly supported distribution like RHEL, or a slightly less ancient, unsupported distribution like Fedora Core 13. Upgrading the Linux distribution on the Infiniband nodes (cluster001 through cluster016) should only be done if it is known the Infiniband drivers can be compiled and installed on that distribution. We need at a bare minimum the QIB driver, but there are some other requisite parts of the stack needed by the network and other utilities (e.g., IP over IB).
Note: these steps are consolidated in the file /shared/clientconfig/infiniband_install/preinstall.sh.
Before installing, it is necessary to remove the existing drivers (which don’t work properly), set some compiler flags, and get rid of a spurious autoconf.h that breaks the build:
rm -f /lib/modules/`uname -r`/build/include/linux/autoconf.h export CFLAGS="-lpthread -ldl" mv /lib/modules/`uname -r`/kernel/drivers/infiniband/hw/ipath/ib_ipath.ko \ /lib/modules/`uname -r`/kernel/drivers/infiniband/hw/ipath/ib_ipath.ko.disabled rmmod ib_ipath rmmod ib_core
Build (do this only once to generate the RPM files)
Untar the OFED package:
tar xvfz OFED-126.96.36.199.tgz
Run the installation script:
cd OFED-188.8.131.52 ./install.pl
Choose option 2) Install OFED Software. On the next menu, choose 4) Customize.
Select yes on each of the following modules:
kernel-ib core mthca qib ipoib sdp srp rds kernel-ib-devel libibverbs libibverbs-devel libibverbs-devel-static libibverbs-utils libibverbs-debuginfo libipathverbs libipathverbs-devel libipathverbs-debuginfo libibcm libibcm-devel libibcm-debuginfo libibumad libibumad-devel libibumad-static libibumad-debuginfo libibmad libibmad-devel libibmad-static libibmad-debuginfo ibsim ibsim-debuginfo ibacm librdmacm librdmacm-utils librdmacm-devel librdmacm-debuginfo libsdp libsdp-devel libsdp-debuginfo opensm opensm-libs opensm-devel opensm-debuginfo opensm-static compat-dapl compat-dapl-devel dapl dapl-devel dapl-devel-static dapl-utils dapl-debuginfo perftest mstflint sdpnetstat srptools rds-tools rds-devel ibutils infiniband-diags qperf qperf-debuginfo ofed-docs infinipath-psm infinipath-psm-devel
Be sure to select no on these modules:
libmthca libmthca-devel-static libmthca-debuginfo libmlx4 libmlx4-devel libmlx4-debuginfo libcxgb3 libcxgb3-devel libcxgb3-debuginfo libcxgb4 libcxgb4-devel libcxgb4-debuginfo libnes libnes-devel-static libnes-debuginfo ofed-scripts mpi-selector mvapich_gcc mvapich2_gcc openmpi_gcc mpitests_mvapich_gcc mpitests_mvapich2_gcc mpitests_openmpi_gcc build32
Install (do this once per machine after building)
Be sure to use the -c ofed.conf option to re-use the options selected when building in the first place. This step will use the RPMs just built in the previous step (they’re in the RPMS/fedora-release-13-1/x86_64/ directory):
cd OFED-184.108.40.206 ./install.pl -c ofed.conf
Note: these steps are consolidated in the file /shared/clientconfig/infiniband_install/postinstall.sh.
Disable the network manager:
chkconfig network on chkconfig NetworkManager off /etc/init.d/NetworkManager stop
Add some flags to force the driver to try to negotiate the fastest data rate, or fail. This is preferable to coming up silently with the wrong infiniband bitrate:
echo "options ib_qib compat_ddr_negotiate=0 ibmtu=4" > /etc/modprobe.d/ib_qib.conf
Apply a ulimit fix so MPI works properly
echo "* soft memlock unlimited" >> /etc/security/limits.conf echo "* hard memlock unlimited" >> /etc/security/limits.conf
Configure the network:
cd /shared/clientconfig/infiniband_install cp ifcfg-ib0 /etc/sysconfig/network-scripts/ifcfg-ib0 cp ifcfg-eth0 /etc/sysconfig/network-scripts/ifcfg-eth0
Edit the network configuration files just copied in the step above. Change the IPADDR line to the host-specific IP address (you choose this).