METHODS OF CONSTRUCTION AND MONITORING OF "DUBNA-GRID" META-CLUSTER

Mr.   Antonov

D.V. Belyakov *, D.V. Chkhaberidze *, E.N. Cheremisina **, D.C. Golub *, S.N. Dobromyslov ***, A.G. Dolbilov *, V.V. Ivanov *, Val.V. Ivanov *, L.A. Kalmykova *, V.V. Korenkov *Yu.A. Kryukov **,V.V. Mitsyn *, L.A. Popov *, A.A. Rats ***, E.B. Ryabov ***, Yu.S. Smirnov *,****, O.G. Smirnova *****, T.A. Strizh *, P.V. Zrelov *

* Laboratory of Information Technologies,
Joint Institute for Nuclear Research, 141980, Dubna, Russia
** University ``DUBNA'', 141980, Dubna, Russia
*** Administration of Dubna, 141980, Dubna, Russia
**** Chicago University, USA
***** Lund University, Sweden

The project “Dubna-Grid” [1] is aimed at the creation of a distributed meta-computing environment based on none-loaded computing resources of office computers. In early 2004, the project participants started creation of a unified informational-computational environment of the city – “Dubna-Grid” meta-cluster on the basis of resources of secondary schools, Dubna University and the Laboratory of Information Technologies (LIT, JINR). The project foresees creation of a common pool of accessible nodes of more than 1500 units.

Various approaches to the installation of the computational infrastructure of such a scale were discussed at LIT, and the available technologies were studied. Since the Microsoft Windows OS that is used everywhere for office computers does not support solving complicated and resource-consuming computing tasks in the distributed environment, it has been decided to apply a Linux-based technology of visualization of computing and network resources for construction of the meta-cluster [2]. In order to reach the goals, several technologies and all the potential resources have to be integrated into the computing infrastructure of “Dubna-Grid” meta-cluster, controlled by a unified center (LIT JINR).

Creation of the system became possible after construction of the high-speed canal for data transmission in the city. It is based on the single-mode fiber optic channel of general extent about 50km. The structure of the channel has the plural reserve communications providing reliability of the system and ability to parallel data transmission on the basis of a virtual local network.

“Dubna-Grid” meta-cluster includes a managed server which is located in the university “Dubna”, a bridge and so-called clients, computational nodes (work nodes in LCG terminology). Work-nodes represent Virtual PCs emulated by virtual machine technology (VMware)

The following software tools and technologies are used: software support for virtual machines maintenance (VMware) [3], virtual network (VLAN), virtual access to the software and data (AFS) [4], integration of the installation and load of the whole meta-cluster (Warewulf package) [5]. On the central server the following software is installed: Scientific Linux CERN OS; a package for support of cluster architecture Warewulf; Ganglia monitoring system [7]; OpenAFS [4], the batch system Torque with Maui scheduler [6].

In order to provide the effective operation of the meta-cluster, a monitoring of both separate elements and the whole complex is used. At available enormous nodes distributed over the whole city, such information is of special importance. The monitoring of the cluster is done with the help of the Ganglia Monitoring System [7,8].

In some cases, the computer is switched on but in its virtual PC, Linux is not loaded. This may be caused by some errors in VMware program, or kernel-booting process stuck after what it must be restarted. So the node can be in one of the following state:

In order to implement the monitoring system, a specialized script has been written which, on the basis of correspondence between the addresses of Windows and virtual machines, starts a parallel query every five minutes. The look-up results are recorded in a file and are available by reference GET ERRORS at http://dgrsrv.jinr.ru/ganglia/.

The developed logical schemes of the meta-cluster and the technology of its construction provide:

  1. homogeneity of the environment and compactness of the OS,
  2. simplicity of administration and possibility of a dynamic extension of the meta-cluster.

Project main and monitoring sites are available at http://dubna-grid.jinr.ru and http://dgrsrv.jinr.ru/ganglia/

[1] P.V. Zrelov, V.V. Ivanov, Val.V. Ivanov, V.V. Korenkov, Yu.A. Krykov, A.A Rats, E.B. Ryabov, Yu.S. Smirnov, O.G. Smirnova, T.A. Strizh: Project "Dubna-Grid". In: Proc. of Int. Conference "Distributed Computing and Grid-Technologies in Science and Education", June 29 – July 2, 2004, Dubna, Russia, pp.48-53(in Russian).
[2] A. Lacis. How to construct and use supercomputer. Moscow, Bestceller, 2003.
[3] http://www.wmvare.com
[4] http://www.openafs.org
[5] http://www.warewulf-cluster.org
[6] http://www.clusterresources.com
[7] http://ganglia.sourceforge.net
[8] Matthew L.Massie, Brent N.Chun, David E.Culler. The Ganglia Distributed Monitoring System: Monitoring, Implementation, and Experience, Parallel Computing 30(2004)817-840.


Print          Close