Storage-v0.0.1
Basic System Installation
The basic requirements of this procedure is to have RHEL ES 4.3 installed and running on the hardware. Both 32-bit x86 and 64-bit x86_64 version should work. The RHEL4 clones CentOS 4.3 http://www.centos.org/ and Scientific Linux 4.3 https://www.scientificlinux.org/ are also valid for the benchmark. Installation instructions can be found on the sites or are provide with the copy.
All benchmarks should be run in either 32-bit or 64-bit mode. All further examples will assume a 32-bit version is used.
Partition setup
The partition setup is mostly important for the disk performance benchmark. For this benchmark it is advised to use 4 partitions on the logical volume(s) that are not used for the system disks. Creating 4 (almost) equal partitions is done using /sbin/fdisk <device>. E.g. if the volumes for the data disks are seen by the system as 2 devices /dev/sda and /dev/sdb, then one will end up with 4 partitions /dev/sda1, /dev/sda2, /dev/sdb1 and /dev/sdb2. These 4 devices will no be used to create a filesystem on them.
Filesystem setup
The XFS filesystem is advised to be used for the benchmarks. The official RHEL4 releases doesn't support the XFS filesystem, but the necessary kernel-modules for the clones can be found at e.g. http://dev.centos.org/centos/4/testing/i386/RPMS/ for the 32-bit release of CentOS 4.3. There one can also find the necessary xfsprogs-rpm to create the XFS partitions. The following command should b used to create the filesystem on the partitions:
mkfs.xfs -d agsize=4g -l version=1 -I size=512 <device>
Network Performance
Hardware requirements
Apart from the system to be tested, two more systems with an equivalent network configuration interface are required. The helper systems and the complete network infrastructure should not be of lower performance in terms of network bandwidth than the system to be tested. The network infrastructure should be idle apart from the tests.
In the example commands shown below, it is assumed that the system to be tested is named server, and the two helper systems are named client01 and client02, respectively.
Software requirements
On all machines taking part in the test, version 2.0.2 of the Iperf tool needs to be installed. Its original Web site is http://dast.nlanr.net/Projects/Iperf/; version 2.0.2, packaged as an RPM, can be downloaded from http://dag.wieers.com/packages/iperf/iperf-2.0.2-1.2.el4.rf.i386.rpm.
Test details
The tests can be run from a non-privileged account if port numbers higher than 1024 are used; however, the example commands below assume the tests are run as root.
Each test will be run for 15 minutes, with a TCP window size of 256 kB if possible. After 15 minutes, on the server lines are displayed similar to
[ ID] Interval Transfer Bandwidth [ 6] 0.0-900.0 sec 98.7 GBytes 112 MBytes/sec
The number displayed as 'Bandwidth' is the one referred to in the test descriptions below. After each test, the 'iperf -s' instance needs to be killed (for example by hitting CTRL-C).
Single TCP stream receiving data
On the server, run a process ready to receive data:
[root@server] iperf -s -p 1234 -w 256k -fA
Then, on client 1, run a process sending data to the server:
[root@client01] iperf -c server -p 1234 -w 256k -fA -t 900 -i 15
(The -s option indicates that the iperf process is to function as an iperf server; -c server means it is to function as a client connecting to an iperf instance on server. -p and -w specify the port number and the TCP window size, respectively; -fA means that bandwidth numbers are to be printed in multiples of Bytes/s, with k, M, and G denoting the first, second, and third power of 1024. -t 900 specifies the duration of the test in seconds; -i 15 causes a regular report to be written to the screen every 15 seconds.)
The test is to be repeated with client02 replacing client01; the result is the arithmetic average of the bandwidths measured with client01 and client02.
Single TCP stream sending data
On the client, run a process ready to receive data:
[root@client01] iperf -s -p 1234 -w 256k -fA -i 15
Then, on the server, run a process sending data to the client:
[root@server] iperf -c client01 -p 1234 -w 256k -fA -t 900
The test is to be repeated with client02 replacing client01; the result is the arithmetic average of the bandwidths measured with client01 and client02.
Two TCP streams receiving data
On the server, run (in two different terminal windows) two processes ready to receive data:
[root@server] iperf -s -p 1234 -w 256k -fA [root@server] iperf -s -p 1235 -w 256k -fA
Then, on both clients, run processes sending data to the server (start these processes at the same time):
[root@client01] iperf -c server -p 1234 -w 256k -fA -t 900 -i 15 [root@client02] iperf -c server -p 1235 -w 256k -fA -t 900 -i 15
The result is the sum of the bandwidths shown for the two processes on the server.
Two TCP streams sending data
On the clients, run one process each ready to receive data:
[root@client01] iperf -s -p 1234 -w 256k -fA -i 15 [root@client02] iperf -s -p 1235 -w 256k -fA -i 15
Then, on the server, run (in two different terminal windows) two processes sending data to the clients (start these processes at the same time):
[root@server] iperf -c client01 -p 1234 -w 256k -fA -t 900 [root@server] iperf -c client02 -p 1235 -w 256k -fA -t 900
The result is the sum of the bandwidths shown for the two processes on the server.
Two TCP streams receiving data, two TCP streams sending data
On the server, run (in two different terminal windows) two processes ready to receive (and send) data:
[root@server] iperf -s -p 1234 -w 256k -fA [root@server] iperf -s -p 1235 -w 256k -fA
Then, on both clients, run processes sending data to the server, and receiving data from the server (start these processes at the same time):
[root@client01] iperf -c server -p 1234 -w 256k -fA -t 900 -i 15 -d [root@client02] iperf -c server -p 1235 -w 256k -fA -t 900 -i 15 -d
(The -d option causes iperf to simultaneously send data to the server, and request data from the server.)
The result is the sum of the four bandwidths shown for the two processes on the server (two bandwidths per server process).
Disk Performance
Hardware requirements
The disk performance tests are confined to the system to be tested. They require that the logical volumes are configured correctly. On each logical volume, an xfs file system must be created using the command
mkfs.xfs -d agsize=4g -l version=1 -i size=512 <device>
These file systems must be mounted. In the further description of this benchmarks, the mount points be called
/data<nn>
Here <nn> counts the file systems for data storage in two-digit format starting from 01.
Software requirements
For the disk performance tests, version 3.263 of the iozone test suite will be used. Its original web site is http://www.iozone.org; this version can be downloaded from http://dag.wieers.com/packages/iozone/iozone-3.263-1.el4.rf.i386.rpm.
Test details
Full details about iozone are available from the file /usr/share/doc/iozone-3.263/Iozone_ps.gz, and from the iozone (1) man page, both installed with the RPM (see above). For the purposes of this test, only block I/O (write/rewrite, read/reread) will be considered with a record size of 256 KB and a file size of 16 GB.
(KB, MB and GB denote 1024 bytes, 1024*1024 bytes, and 1024*1024*1024 bytes, respectively throughout this document.)
Single logical volume, one data stream
iozone will be run with the following command line:
iozone -eM -s16g -r256k -i0 -i1 -f /data<nn>/iozone >> single-result
/data<nn> is one of the filesystems, onto which the logical volumes for data are mounted.
(The options mean the following: -e causes flushing operations (fsync, fflush) to be included in the timing; -M causes the output of 'uname -a' to be included in the output; -s 16g and -r 256k specify the file size and the record size, respectively; -i0 and -i1 select the write/rewrite test, and the read/reread test from the variety of tests that iozone offers.)
This command is to be run at least 10 times in total such that it is run the same number of times on all filesystems on data disks. For example, for a system offering four logical volumes on data disks, the test would need to be run three times on each such file system.
Each invocation will write, on successful completion of iozone, a record into single-result. The record contains lines, among others, similar to the following ones:
random random bkwd record stride KB reclen write rewrite read reread read write read rewrite read fwrite frewrite fread freread 16777216 256 50124 48572 43751 42401
The third field (count starting at 1) of the number line contains the write speed in KB/s (50124 KB/s in this case); the fifth field contains the read speed in KB/s (43751 KB/s in this case). The single-stream transfer speed for reading and writing are the arithmetic averages of all individual read speeds and write speeds, respectively.
Aggregate transfer rate across multiple logical volumes
For this test, multiple (at least 10) synchronised iozone threads will be run. It is of course expected that maximum performance will be obtained by using all logical volumes on data disks in an equal way. For example, for a system with four logical volumes (with xfs file systems created and mounted as described above), three iozone threads will be run on each logical volume. In this case, the command will look like this; for different numbers of logical volumes, this command would need to be adapted, the only requirement being that the number of iozone threads is at least 10:
iozone -eM -s16g -r256k -i0 -i1 -t12 -F /data01/iozone1 /data02/iozone1 /data03/iozone1 /data04/iozone1 /data01/iozone2 /data02/iozone2 /data03/iozone2 /data04/iozone2 /data01/iozone3 /data02/iozone3 /data03/iozone3 /data04/iozone3 >> multi-result
(The -t option specifies that 12 threads are to be created; the -F option is expecting 12 unique file names.)
The aggregate performance of one run for writing and reading, respectively, is the number given in the line 'Parent sees throughput for 12 initial writers = 147035.71 KB/sec' and 'Parent sees throughput for 12 readers = 235826.84 KB/sec', respectively. The test is to be run three times; the aggregate performance for reading and writing is the arithmetic average of the aggreate performance of the three runs for reading and writing, respectively.