wei? nicht, warum ich das machen muss
authorweiher
Tue, 27 Mar 2018 16:10:03 +0200
changeset 16 4afdd7d61fe2
parent 15 523ca1dfd077 (current diff)
parent 14 7877020262a9 (diff)
child 17 02a3741242b2
wei? nicht, warum ich das machen muss
--- a/q-doc/getting_started.rst	Tue Mar 27 16:07:14 2018 +0200
+++ b/q-doc/getting_started.rst	Tue Mar 27 16:10:03 2018 +0200
@@ -9,52 +9,304 @@
 
 Installing Rocks on the frontend - node (q.rz-berlin.mpg.de, 141.14.128.18) 
 
-ToDo: rz->fhi
-
-Access to iDrac service interface
-=================================
+Access to iDrac service interface and boot up the kernel iso
+============================================================
 
 Access via http to 141.14.128.17 (q-sp.rz-berlin.mpg.de)
+(q-sp manualy set to this address, first assignment via dhcp)
 
 Initial password can be found at extendable label.
 
-Root password changed to PP&B default remote access.
+Root password changed to PP&B default remote access password.
+
+Launch the console. (on Mac don't forget to allow java running)
+
+Map kernel.iso (from http://www.rocksclusters.org/downloads.html) as DVD.
+Boot up (warm boot, booting from DVD)
+
+
+Configuring the system
+======================
+
+Instructions see http://central-7-0-x86-64.rocksclusters.org/roll-documentation/base/7.0/install-frontend-7.html
+
+
+Name: q.fhi-berlin.mpg.de
+Public net (FHI) on em2 (10 Gbit/s) with IP 141.14.128.18/20
+Gateway: 141.14.128.128, Nameserver: 141.14.128.1, Search domains:
+fhi-berlin.mpg.de, rz-berlin.mpg.de
+
+Private net on p7p1 (The IP 10.1.1.1 gets choosen by the system)
+Disk setup should include RAID System with 10 TByte as home. Two Raids
+should be present one Raid1, one Raid15 (the big one)
+
+Rocks rolls all except fingerprint, htcondor?
+
+RAID 1 with 2 x 150 GByte SSDs as boot disk.
+
+RAID6x -> 11 TiB, 10 TiB for /home, 1 TiB for /export (used for /share/apps)
+Must be configured manualy.
+
+Install, will take 2-3 hrs.
+
+Update rocks
+============
 
-You may already have sphinx `sphinx <http://sphinx.pocoo.org/>`_
-installed -- you can check by doing::
+Leads to trouble with unresolved stuff::
+
+  baseurl=http://ftp.fau.de/centos/
+  osversion=7.4.1708
+  version=`date +%F`
+  rocks create mirror ${baseurl}/centos/${osversion}/updates/x86_64/Packages/ rollname=Updates-CentOS-${osversion} version=${version}
+  rocks add roll Updates-CentOS-${osversion}-${version}*iso
+  rocks enable roll Updates-CentOS-${osversion} version=${version}
+  (cd /export/rocks/install; rocks create distro)
+  yum clean all; yum update
+
+Prepare the iDracs for the compute nodes
+========================================
+
+.. note::
 
-  python -c 'import sphinx'
+   All the scripts can be found on http://hg.rz-berlin.mpg.de/qSetup
+   (yum install mercurial)
+
+To give IP's to the iDrac interfaces of the compute nodes, DHCP must be
+setup for the management net.
+
+To add a management net, the mangement switches (1 Gbit/s, q-msw-01, q-msw-02) must be
+configured (link?) and connected to iDrac's of all nodes including frontend.
 
-If that fails grab the latest version of and install it with::
+On the frontend a network must be created for this mgmt-net and a interface
+must be dedicated to it::
+
+ rocks add network mgmt subnet=10.0.12.0 netmask=255.255.255.0
+ rocks set host interface ip q iface=em3 ip=10.0.12.1
+ rocks set host interface subnet q iface=em3 subnet=mgmt
+ rocks set host interface name q iface=em3 name=q-mgmt 
+ rocks sync config
+ rocks sync host network q
+ rocks list network
 
-  > sudo easy_install -U Sphinx
+Now DHCP for these hosts must be included. As the rocks distro creates via
+kickstart dhcp entries the python rocks system file must be altered::
+
+ vi /opt/rocks/lib/python2.7/site-packages/rocks/commands/report/host/dhcpd/__init__.py
 
-Now you are ready to build a template for your docs, using
-sphinx-quickstart::
+ add "em3" to the DHCPARGS : self.addOutput('', 'DHCPDARGS="%s em3"' % device)
+ add self.addOutput('', 'include "/root/FHI/mgmt.dhcp";')
+ just before self.addOutput('', '</file>')
+
+Now one have to create this file with the mac addresses of the iDracs. The
+addresses can be found on the extensable label on the front of the node::
 
-  > sphinx-quickstart
+ [root@q FHI]# cat mgmt.dhcp 
+ subnet 10.0.12.0 netmask 255.255.255.0 {
+  default-lease-time 1200;
+  max-lease-time 1200;
+  option routers 10.0.12.1;
+  option subnet-mask 255.255.255.0;
+  option domain-name "mgmt";
+  option domain-name-servers 10.0.12.1;
+  option broadcast-address None;
+  option interface-mtu 1500;
+  group "mgmt" {
+    host mgmt-q {
+      # Frontend hardware 
+      ethernet 24:6e:96:79:7c:46; 
+      fixed-address 10.0.12.1; 
+    }
+    host sp-compute-0-0 { # iDRAC-BDWKGM2 
+      hardware ethernet d0:94:66:27:7b:5e; 
+      fixed-address 10.0.12.10; 
+    }
+   host sp-compute-0-1 { # iDRAC-BDWHGM2
+      hardware ethernet d0:94:66:28:47:cd;
+      fixed-address 10.0.12.11;
+   }
+   host sp-compute-0-2 { # iDRAC-BDW9GM2
+      hardware ethernet d0:94:66:2c:0d:e2;
+      fixed-address 10.0.12.12;
+   }
+   host sp-compute-0-3 { # iDRAC-BDWGGM2
+      hardware ethernet d0:94:66:20:2a:34;
+      fixed-address 10.0.12.13;
+   }
+ host sp-compute-0-4 { # iDRAC-BDRGGM2 hardware ethernet d0:94:66:1f:3f:cc; fixed-address 10.0.12.14; }
+ host sp-compute-0-5 { # iDRAC-BDTKGM2 hardware ethernet d0:94:66:28:61:99; fixed-address 10.0.12.15; }
+ host sp-compute-0-6 { # iDRAC-BDXBGM2 hardware ethernet d0:94:66:27:62:39; fixed-address 10.0.12.16; }
+ host sp-compute-0-7 { # iDRAC-BDVCGM2 hardware ethernet d0:94:66:2c:0c:4a; fixed-address 10.0.12.17; }
+ host sp-compute-0-8 { # iDRAC-BDT9GM2 hardware ethernet d0:94:66:2b:ff:4f; fixed-address 10.0.12.18; }
+ host sp-compute-0-9 { # iDRAC-BDVDGM2 hardware ethernet d0:94:66:27:52:a2; fixed-address 10.0.12.19; }
+ host sp-compute-0-10 { # iDRAC-BDVJGM2 hardware ethernet d0:94:66:27:48:c2; fixed-address 10.0.12.20; }
+ host sp-compute-0-11 { # iDRAC-BDRDGM2 hardware ethernet d0:94:66:1f:42:46; fixed-address 10.0.12.21; }
+ host sp-compute-0-12 { # iDRAC-BDWFGM2 hardware ethernet d0:94:66:27:5b:d6; fixed-address 10.0.12.22; }
+ host sp-compute-0-13 { # iDRAC-BDSBGM2 hardware ethernet d0:94:66:20:29:54; fixed-address 10.0.12.23; }
+ host sp-compute-0-14 { # iDRAC-BDSDGM2 hardware ethernet d0:94:66:20:28:3a; fixed-address 10.0.12.24; }
+ host sp-compute-0-15 { # iDRAC-BDRJGM2 hardware ethernet d0:94:66:1f:53:f7; fixed-address 10.0.12.25; }
+ host sp-compute-0-16 { # iDRAC-BDWDGM2 hardware ethernet d0:94:66:2c:0e:0a; fixed-address 10.0.12.26; } 
+ host sp-compute-0-17 { # iDRAC-BDWBGM2 hardware ethernet d0:94:66:28:45:62; fixed-address 10.0.12.27; }
+ host sp-compute-0-18 { # iDRAC-BDWJGM2 hardware ethernet d0:94:66:20:2b:be; fixed-address 10.0.12.28; }
+ host sp-compute-0-19 { # iDRAC-BDWGGM2 hardware ethernet d0:94:66:20:2a:34; fixed-address 10.0.12.29; }
+ }
 
-accepting most of the defaults.  I choose "sampledoc" as the name of my
-project.  cd into your new directory and check the contents::
+Restart dhcpd services by::
+ 
+ service dhcpd restart
+ -> Redirecting to /bin/systemctl restart dhcpd.service
+ 
+Check (Default password for iDrac = calvin): 
+ [root@q log]# ssh root@10.0.12.10
+ root@10.0.12.10's password: 
+
+Network cable connected to B (left, near to PCI-X bus)
 
-  home:~/tmp/sampledoc> ls
-  Makefile	_static		conf.py
-  _build		_templates	index.rst
+We want to do PXE from 10Gbit Interface on PCIx card X710::
+ /admin1-> racadm get NIC.NICConfig
+ NIC.NICConfig.1 [Key=NIC.Slot.2-1-1#NICConfig]
+ NIC.NICConfig.2 [Key=NIC.Slot.2-2-1#NICConfig]
+ NIC.NICConfig.3 [Key=NIC.Embedded.1-1-1#NICConfig]
+ NIC.NICConfig.4 [Key=NIC.Embedded.2-1-1#NICConfig]
+
+ /admin1-> racadm set NIC.NICConfig.2.LegacyBootProto PXE
+ [Key=NIC.Slot.2-2-1#LegacyBootProto]
+ RAC1017: Successfully modified the object value and the change is in 
+ pending state.
+ To apply modified value, create a configuration job and reboot 
+ the system. To create the commit and reboot jobs, use "jobqueue" 
+ command. For more information about the "jobqueue" command, 
+ see RACADM help.
+
+ /admin1-> racadm jobqueue create NIC.Slot.2-2-1
+ RAC1024: Successfully scheduled a job.
+ Verify the job status using "racadm jobqueue view -i JID_xxxxx" command.
+ Commit JID = JID_168281383887
+ /admin1-> racadm serveraction powercycle
 
-The index.rst is the master ReST for your project, but before adding
-anything, let's see if we can build some html::
+ /admin1-> racadm set BIOS.BiosBootSettings.BootSeq NIC.Slot.2-2-1
+ [Key=BIOS.Setup.1-1#BiosBootSettings]
+ RAC1017: Successfully modified the object value and the change is in 
+ pending state.
+ To apply modified value, create a configuration job and reboot 
+ the system. To create the commit and reboot jobs, use "jobqueue" 
+ command. For more information about the "jobqueue" command, see RACADM 
+ help.
+ /admin1-> racadm get BIOS.BiosBootSettings.BootSeq
+ [Key=BIOS.Setup.1-1#BiosBootSettings]
+ BootSeq=NIC.Embedded.1-1-1,NIC.Slot.2-1-1 
+ (Pending Value=NIC.Slot.2-1-1,NIC.Embedded.1-1-1)
+ /admin1-> racadm jobqueue create BIOS.Setup.1-1
+ RAC1024: Successfully scheduled a job.
+ Verify the job status using "racadm
+ jobqueue view -i JID_xxxxx" command.
+ Commit JID = JID_168368767313
+
 
-  make html
+ /admin1-> racadm jobqueue view -i JID_168368767313
+ ---------------------------- JOB -------------------------
+ [Job ID=JID_168368767313]
+ Job Name=Configure: BIOS.Setup.1-1
+ Status=Scheduled
+ Start Time=[Now]
+ Expiration Time=[Not Applicable]
+ Message=[JCP001: Task successfully scheduled.]
+ Percent Complete=[0]
+ ----------------------------------------------------------
+ /admin1-> racadm serveraction powercycle
+
+To get rid of opensm log entries::
+
+ /bin/systemctl disable opensm
+
+
+Could not start insart-ethers. httpd was not running. Had to create
+/run/httpd for apache:apache
+
+Then https could be started with "service httpd start"
+
+Now start insert-ethers but this was just a test how to deal with the iDrac.
+
+Put root's ssh key to all iDracs.
+Create FHI/idracSSHKey::
+
+ racadm sshpkauth -i 2 -k 1 -t "ssh-rsa AAAA...root@q.fhi-berlin.mpg.de"
+
+Key taken from /root/.ssh/id_rsa.pub.
 
-If you now point your browser to :file:`_build/html/index.html`, you
-should see a basic sphinx site.
+Create FHI/setInitSSHKeyToIdracs::
+
+ #!/bin/bash
+
+ for ip in 10.0.12.{10..29}
+ do
+   echo "connect to  $ip you will be asked for a password (if its a new key) ->
+  calvin"
+   ssh $ip < idracSSHKey
+ done
+
+
+
+.. _update-cluster-software
+
+Update Cluster software
+=======================
 
-.. image:: _static/basic_screenshot.png
+Must be done with yum::
+
+ yum clean all
+ rm -rf /var/cache/yum
+ yum --enablerepo=updates check-update
+ yum --enablerepo=updates update
+
+Now the new packages should be copyied to rocks install contrib. But the
+source dir seems not to exists::
+
+ cp /var/cache/yum/x86_64/7/updates/packages/* /export/rocks/install/contrib/7.0/x86_64/RPMS/
+
+fails. Wait for info from mail list.
+
+
+
+Activate ldap authentication
+----------------------------
 
-.. _fetching-the-data:
+
+Senseless as gid and uid must be offseted by 1000...
+
+Activate sssd (should be better than nscd)::
+
+ yum install -y sssd
+ yum downgrade sssd-client
+ yum downgrade libsss_idmap
+ yum install -y sssd
+ authconfig --enableldap --enableldapauth --ldapserver="ldap.rz-berlin.mpg.de" --ldapbasedn="ou=people,dc=ppb,dc=rz-berlin,dc=mpg,dc=de" --update --enablemkhomedir
+ yum install c-ares-devel
+ authconfig --enableldap --enableldapauth
+           --ldapserver="ldap.rz-berlin.mpg.de"
+           --ldapbasedn="ou=people,dc=ppb,dc=rz-berlin,dc=mpg,dc=de" --update
+           --enablemkhomedir
+ systemctl stop sssd.service
+ systemctl start sssd.service
+ systemctl status sssd.service
 
-Fetching the data
------------------
+Software Install
+================
+
+Intel compiler 2016.4 (Gert told me) and
+Intel compiler 2018.1
+
+Download intel License Manager
+
+Intel 2018.1 with PGI support!
+Needs 32bit libs::
+
+  yum install libstdc++-devel.i686
+  yum install glibc-devel.i686
+  yum install libgcc.i686 (already installed)
+
+
+
+image:: _static/basic_screenshot.png
 
 Now we will start to customize out docs.  Grab a couple of files from
 the `web site <https://github.com/matplotlib/sampledoc>`_
@@ -114,3 +366,5 @@
 the `sphinx <http://sphinx.pocoo.org/>`_ site itself -- see
 :ref:`custom_look`.
 
+
+???END
--- a/q-doc/index.rst	Tue Mar 27 16:07:14 2018 +0200
+++ b/q-doc/index.rst	Tue Mar 27 16:10:03 2018 +0200
@@ -6,6 +6,17 @@
 
    This documentation **only** applies to **new Q-Cluster** (successor of yfhix), work in progress
 
+Welcome to Q-Cluster's documentation!
+=====================================
+
+At the end of 2017, we purchased 38 more nodes and 2 GPU nodes to expand our xfhix compute cluster. Everything was delivered before Christmas except for the two GPU nodes.
+Since the frontend node is already 5 years old, we have also purchased a new frontend computer. On this Rocks 7 is now installed. The new system can be accessed at q.fhi-berlin.mpg.de.
+
+.. toctree::
+   :maxdepth: 2
+
+   getting_started.rst
+
 Introducing Q-Cluster 
 =======================
 
@@ -79,22 +90,3 @@
 * :ref:`modindex`
 * :ref:`search`
 * :ref:`glossary`
-
-Welcome to Q-Cluster's documentation!
-=====================================
-
-At the end of 2017, we purchased 38 more nodes and 2 GPU nodes to expand our xfhix compute cluster. Everything was delivered before Christmas except for the two GPU nodes.
-Since the frontend node is already 5 years old, we have also purchased a new frontend computer. On this Rocks 7 is now installed. The new system can be accessed at q.fhi-berlin.mpg.de.
-
-.. toctree::
-   :maxdepth: 2
-
-   getting_started.rst
-
-
-Indices and tables
-==================
-
-* :ref:`genindex`
-* :ref:`modindex`
-* :ref:`search`