UD Community Cluster Update: 9/6/2011
The UD Community Cluster planning passed several milestones this summer. Faculty interest continued to grow, resulting in a doubling of the cluster's initial target size. The cluster will now have 200 compute nodes and over 5,000 processor cores. Most nodes will be dual-socket, with 24 processor cores and 64 GB memory. These are complemented by larger nodes, the largest being quad-socket with 48 cores and 256 GB memory. There are about 5 nodes still available for faculty purchase at $3K each. You can view a list of the current Community Cluster stakeholders and their departments online.
We decided to go with AMD's most recent Opteron processor chip, codenamed Interlagos. The Interlagos processor is AMD's first completely new micro-architecture design since 2003 and is based on their new Bulldozer architecture. UD will be among the first in the world to get the Interlagos processors.
Nodes will be connected using a Quad Data Rate (40 Gb/s) Infiniband network, rather than the slower 10 Gb/s Ethernet network that was planned initially. A major driver for this change was faculty feedback about their intended software applications. Another was that the timing and the bidding process helped make Infiniband financially feasible.
We held a competitive bidding process early in the summer. The winning bid for the servers came from Penguin Computing, a respected producer of high performance clusters.
A high-performance, Lustre-based storage system will have about 200 TB of usable space, mostly for scratch and work files. The Lustre filesystems will allow direct access to the entire storage system by all processor cores in the cluster. RAID-6 (double parity) will be used to increase system reliability. The storage is based on Penguin's Relion 2751 servers. Reputation, cost-effectiveness and reasonable expansion costs figured into the choice of the storage system.
Penguin will build the entire system at its main facility in Sunnyvale, California; install the open-source CentOS operating system components; test it; ship it to UD; and rebuild it in the UD Computing Center.
Penguin plans to deliver the system to the Computing Center in mid-to-late October. After its arrival, IT staff will install additional operating system components, a batch scheduling system, application software and programming development environments; then test and tune the cluster and develop user guides and training. Our target for releasing the system to research users is the end of the semester.
As part of our preparation, IT is engaged in a major construction project in the Chapel St Computing Center. A 2-megawatt power generator is replacing the current 800-kilowatt generator. The effective cooling capacity will be doubled to 180 tons, and we will have a redundantly configured UPS system with an effective 1000 KVA of power. These environmental improvements will be completed by mid-October, providing an even more robust site for the cluster.
If you’re interested in becoming a stakeholder in this research cluster, call or send email to Dick Sacher (firstname.lastname@example.org, 831-1466) in IT Client Support & Services (IT-CS&S).