q-doc/getting_started.rst
author weiher
Mon, 04 Jun 2018 16:38:36 +0200
changeset 18 57431f642e82
parent 14 7877020262a9
permissions -rw-r--r--
Minor additions for comprehension and changes in the document structure

.. _getting_started:


***************
Getting started
***************

.. _installing-rocks

Installing Rocks on the frontend - node (q.rz-berlin.mpg.de, 141.14.128.18) 

Access to iDrac service interface and boot up the kernel iso
============================================================

Access via http to 141.14.128.17 (q-sp.rz-berlin.mpg.de)
(q-sp manualy set to this address, first assignment via dhcp)

Initial password can be found at extendable label.

Root password changed to PP&B default remote access password.

Launch the console. (on Mac don't forget to allow java running)

Map kernel.iso (from http://www.rocksclusters.org/downloads.html) as DVD.
Boot up (warm boot, booting from DVD)


Configuring the system
======================

Instructions see http://central-7-0-x86-64.rocksclusters.org/roll-documentation/base/7.0/install-frontend-7.html


Name: q.fhi-berlin.mpg.de
Public net (FHI) on em2 (10 Gbit/s) with IP 141.14.128.18/20
Gateway: 141.14.128.128, Nameserver: 141.14.128.1, Search domains:
fhi-berlin.mpg.de, rz-berlin.mpg.de

Private net on p7p1 (The IP 10.1.1.1 gets choosen by the system)
Disk setup should include RAID System with 10 TByte as home. Two Raids
should be present one Raid1, one Raid15 (the big one)

Rocks rolls all except fingerprint, htcondor?

RAID 1 with 2 x 150 GByte SSDs as boot disk.

RAID6x -> 11 TiB, 10 TiB for /home, 1 TiB for /export (used for /share/apps)
Must be configured manualy.

Install, will take 2-3 hrs.

Update rocks
============

Leads to trouble with unresolved stuff::

  baseurl=http://ftp.fau.de/centos/
  osversion=7.4.1708
  version=`date +%F`
  rocks create mirror ${baseurl}/centos/${osversion}/updates/x86_64/Packages/ rollname=Updates-CentOS-${osversion} version=${version}
  rocks add roll Updates-CentOS-${osversion}-${version}*iso
  rocks enable roll Updates-CentOS-${osversion} version=${version}
  (cd /export/rocks/install; rocks create distro)
  yum clean all; yum update

Prepare the iDracs for the compute nodes
========================================

.. note::

   All the scripts can be found on http://hg.rz-berlin.mpg.de/qSetup
   (yum install mercurial)

To give IP's to the iDrac interfaces of the compute nodes, DHCP must be
setup for the management net.

To add a management net, the mangement switches (1 Gbit/s, q-msw-01, q-msw-02) must be
configured (link?) and connected to iDrac's of all nodes including frontend.

On the frontend a network must be created for this mgmt-net and a interface
must be dedicated to it::

 rocks add network mgmt subnet=10.0.12.0 netmask=255.255.255.0
 rocks set host interface ip q iface=em3 ip=10.0.12.1
 rocks set host interface subnet q iface=em3 subnet=mgmt
 rocks set host interface name q iface=em3 name=q-mgmt 
 rocks sync config
 rocks sync host network q
 rocks list network

Now DHCP for these hosts must be included. As the rocks distro creates via
kickstart dhcp entries the python rocks system file must be altered::

 vi /opt/rocks/lib/python2.7/site-packages/rocks/commands/report/host/dhcpd/__init__.py

 add "em3" to the DHCPARGS : self.addOutput('', 'DHCPDARGS="%s em3"' % device)
 add self.addOutput('', 'include "/root/FHI/mgmt.dhcp";')
 just before self.addOutput('', '</file>')

Now one have to create this file with the mac addresses of the iDracs. The
addresses can be found on the extensable label on the front of the node::

 [root@q FHI]# cat mgmt.dhcp 
 subnet 10.0.12.0 netmask 255.255.255.0 {
  default-lease-time 1200;
  max-lease-time 1200;
  option routers 10.0.12.1;
  option subnet-mask 255.255.255.0;
  option domain-name "mgmt";
  option domain-name-servers 10.0.12.1;
  option broadcast-address None;
  option interface-mtu 1500;
  group "mgmt" {
    host mgmt-q {
      # Frontend hardware 
      ethernet 24:6e:96:79:7c:46; 
      fixed-address 10.0.12.1; 
    }
    host sp-compute-0-0 { # iDRAC-BDWKGM2 
      hardware ethernet d0:94:66:27:7b:5e; 
      fixed-address 10.0.12.10; 
    }
   host sp-compute-0-1 { # iDRAC-BDWHGM2
      hardware ethernet d0:94:66:28:47:cd;
      fixed-address 10.0.12.11;
   }
   host sp-compute-0-2 { # iDRAC-BDW9GM2
      hardware ethernet d0:94:66:2c:0d:e2;
      fixed-address 10.0.12.12;
   }
   host sp-compute-0-3 { # iDRAC-BDWGGM2
      hardware ethernet d0:94:66:20:2a:34;
      fixed-address 10.0.12.13;
   }
 host sp-compute-0-4 { # iDRAC-BDRGGM2 hardware ethernet d0:94:66:1f:3f:cc; fixed-address 10.0.12.14; }
 host sp-compute-0-5 { # iDRAC-BDTKGM2 hardware ethernet d0:94:66:28:61:99; fixed-address 10.0.12.15; }
 host sp-compute-0-6 { # iDRAC-BDXBGM2 hardware ethernet d0:94:66:27:62:39; fixed-address 10.0.12.16; }
 host sp-compute-0-7 { # iDRAC-BDVCGM2 hardware ethernet d0:94:66:2c:0c:4a; fixed-address 10.0.12.17; }
 host sp-compute-0-8 { # iDRAC-BDT9GM2 hardware ethernet d0:94:66:2b:ff:4f; fixed-address 10.0.12.18; }
 host sp-compute-0-9 { # iDRAC-BDVDGM2 hardware ethernet d0:94:66:27:52:a2; fixed-address 10.0.12.19; }
 host sp-compute-0-10 { # iDRAC-BDVJGM2 hardware ethernet d0:94:66:27:48:c2; fixed-address 10.0.12.20; }
 host sp-compute-0-11 { # iDRAC-BDRDGM2 hardware ethernet d0:94:66:1f:42:46; fixed-address 10.0.12.21; }
 host sp-compute-0-12 { # iDRAC-BDWFGM2 hardware ethernet d0:94:66:27:5b:d6; fixed-address 10.0.12.22; }
 host sp-compute-0-13 { # iDRAC-BDSBGM2 hardware ethernet d0:94:66:20:29:54; fixed-address 10.0.12.23; }
 host sp-compute-0-14 { # iDRAC-BDSDGM2 hardware ethernet d0:94:66:20:28:3a; fixed-address 10.0.12.24; }
 host sp-compute-0-15 { # iDRAC-BDRJGM2 hardware ethernet d0:94:66:1f:53:f7; fixed-address 10.0.12.25; }
 host sp-compute-0-16 { # iDRAC-BDWDGM2 hardware ethernet d0:94:66:2c:0e:0a; fixed-address 10.0.12.26; } 
 host sp-compute-0-17 { # iDRAC-BDWBGM2 hardware ethernet d0:94:66:28:45:62; fixed-address 10.0.12.27; }
 host sp-compute-0-18 { # iDRAC-BDWJGM2 hardware ethernet d0:94:66:20:2b:be; fixed-address 10.0.12.28; }
 host sp-compute-0-19 { # iDRAC-BDWGGM2 hardware ethernet d0:94:66:20:2a:34; fixed-address 10.0.12.29; }
 }

Restart dhcpd services by::
 
 service dhcpd restart
 -> Redirecting to /bin/systemctl restart dhcpd.service
 
Check (Default password for iDrac = calvin): 
 [root@q log]# ssh root@10.0.12.10
 root@10.0.12.10's password: 

Network cable connected to B (left, near to PCI-X bus)

We want to do PXE from 10Gbit Interface on PCIx card X710::
 /admin1-> racadm get NIC.NICConfig
 NIC.NICConfig.1 [Key=NIC.Slot.2-1-1#NICConfig]
 NIC.NICConfig.2 [Key=NIC.Slot.2-2-1#NICConfig]
 NIC.NICConfig.3 [Key=NIC.Embedded.1-1-1#NICConfig]
 NIC.NICConfig.4 [Key=NIC.Embedded.2-1-1#NICConfig]

 /admin1-> racadm set NIC.NICConfig.2.LegacyBootProto PXE
 [Key=NIC.Slot.2-2-1#LegacyBootProto]
 RAC1017: Successfully modified the object value and the change is in 
 pending state.
 To apply modified value, create a configuration job and reboot 
 the system. To create the commit and reboot jobs, use "jobqueue" 
 command. For more information about the "jobqueue" command, 
 see RACADM help.

 /admin1-> racadm jobqueue create NIC.Slot.2-2-1
 RAC1024: Successfully scheduled a job.
 Verify the job status using "racadm jobqueue view -i JID_xxxxx" command.
 Commit JID = JID_168281383887
 /admin1-> racadm serveraction powercycle

 /admin1-> racadm set BIOS.BiosBootSettings.BootSeq NIC.Slot.2-2-1
 [Key=BIOS.Setup.1-1#BiosBootSettings]
 RAC1017: Successfully modified the object value and the change is in 
 pending state.
 To apply modified value, create a configuration job and reboot 
 the system. To create the commit and reboot jobs, use "jobqueue" 
 command. For more information about the "jobqueue" command, see RACADM 
 help.
 /admin1-> racadm get BIOS.BiosBootSettings.BootSeq
 [Key=BIOS.Setup.1-1#BiosBootSettings]
 BootSeq=NIC.Embedded.1-1-1,NIC.Slot.2-1-1 
 (Pending Value=NIC.Slot.2-1-1,NIC.Embedded.1-1-1)
 /admin1-> racadm jobqueue create BIOS.Setup.1-1
 RAC1024: Successfully scheduled a job.
 Verify the job status using "racadm
 jobqueue view -i JID_xxxxx" command.
 Commit JID = JID_168368767313


 /admin1-> racadm jobqueue view -i JID_168368767313
 ---------------------------- JOB -------------------------
 [Job ID=JID_168368767313]
 Job Name=Configure: BIOS.Setup.1-1
 Status=Scheduled
 Start Time=[Now]
 Expiration Time=[Not Applicable]
 Message=[JCP001: Task successfully scheduled.]
 Percent Complete=[0]
 ----------------------------------------------------------
 /admin1-> racadm serveraction powercycle

To get rid of opensm log entries::

 /bin/systemctl disable opensm


Could not start insart-ethers. httpd was not running. Had to create
/run/httpd for apache:apache

Then https could be started with "service httpd start"

Now start insert-ethers but this was just a test how to deal with the iDrac.

Put root's ssh key to all iDracs.
Create FHI/idracSSHKey::

 racadm sshpkauth -i 2 -k 1 -t "ssh-rsa AAAA...root@q.fhi-berlin.mpg.de"

Key taken from /root/.ssh/id_rsa.pub.

Create FHI/setInitSSHKeyToIdracs::

 #!/bin/bash

 for ip in 10.0.12.{10..29}
 do
   echo "connect to  $ip you will be asked for a password (if its a new key) ->
  calvin"
   ssh $ip < idracSSHKey
 done



.. _update-cluster-software

Update Cluster software
=======================

Must be done with yum::

 yum clean all
 rm -rf /var/cache/yum
 yum --enablerepo=updates check-update
 yum --enablerepo=updates update

Now the new packages should be copyied to rocks install contrib. But the
source dir seems not to exists::

 cp /var/cache/yum/x86_64/7/updates/packages/* /export/rocks/install/contrib/7.0/x86_64/RPMS/

fails. Wait for info from mail list.



Activate ldap authentication
----------------------------


Senseless as gid and uid must be offseted by 1000...

Activate sssd (should be better than nscd)::

 yum install -y sssd
 yum downgrade sssd-client
 yum downgrade libsss_idmap
 yum install -y sssd
 authconfig --enableldap --enableldapauth --ldapserver="ldap.rz-berlin.mpg.de" --ldapbasedn="ou=people,dc=ppb,dc=rz-berlin,dc=mpg,dc=de" --update --enablemkhomedir
 yum install c-ares-devel
 authconfig --enableldap --enableldapauth
           --ldapserver="ldap.rz-berlin.mpg.de"
           --ldapbasedn="ou=people,dc=ppb,dc=rz-berlin,dc=mpg,dc=de" --update
           --enablemkhomedir
 systemctl stop sssd.service
 systemctl start sssd.service
 systemctl status sssd.service

Software Install
================

Intel compiler 2016.4 (Gert told me) and
Intel compiler 2018.1

Download intel License Manager

Intel 2018.1 with PGI support!
Needs 32bit libs::

  yum install libstdc++-devel.i686
  yum install glibc-devel.i686
  yum install libgcc.i686 (already installed)



image:: _static/basic_screenshot.png

Now we will start to customize out docs.  Grab a couple of files from
the `web site <https://github.com/matplotlib/sampledoc>`_
or git.  You will need :file:`getting_started.rst` and
:file:`_static/basic_screenshot.png`.  All of the files live in the
"completed" version of this tutorial, but since this is a tutorial,
we'll just grab them one at a time, so you can learn what needs to be
changed where.  Since we have more files to come, I'm going to grab
the whole git directory and just copy the files I need over for now.
First, I'll cd up back into the directory containing my project, check
out the "finished" product from git, and then copy in just the files I
need into my :file:`sampledoc` directory::

  home:~/tmp/sampledoc> pwd
  /Users/jdhunter/tmp/sampledoc
  home:~/tmp/sampledoc> cd ..
  home:~/tmp> git clone https://github.com/matplotlib/sampledoc.git tutorial
  Cloning into 'tutorial'...
  remote: Counting objects: 87, done.
  remote: Compressing objects: 100% (43/43), done.
  remote: Total 87 (delta 45), reused 83 (delta 41)
  Unpacking objects: 100% (87/87), done.
  Checking connectivity... done
  home:~/tmp> cp tutorial/getting_started.rst sampledoc/
  home:~/tmp> cp tutorial/_static/basic_screenshot.png sampledoc/_static/

The last step is to modify :file:`index.rst` to include the
:file:`getting_started.rst` file (be careful with the indentation, the
"g" in "getting_started" should line up with the ':' in ``:maxdepth``::

  Contents:

  .. toctree::
     :maxdepth: 2

     getting_started.rst

and then rebuild the docs::

  cd sampledoc
  make html


When you reload the page by refreshing your browser pointing to
:file:`_build/html/index.html`, you should see a link to the
"Getting Started" docs, and in there this page with the screenshot.
`Voila!`

Note we used the image directive to include to the screenshot above
with::

  .. image:: _static/basic_screenshot.png


Next we'll customize the look and feel of our site to give it a logo,
some custom css, and update the navigation panels to look more like
the `sphinx <http://sphinx.pocoo.org/>`_ site itself -- see
:ref:`custom_look`.


???END