AdminPage: Difference between revisions
Jump to navigation
Jump to search
m (Created page with " === Admin section ===
*elog
* How to properly switch off the cluster
* How to properly put the cluster on
==== CMS Services ====
...") |
|||
(71 intermediate revisions by 8 users not shown) | |||
Line 1: | Line 1: | ||
{{DISPLAYTITLE:Page for Administrators}} | |||
=== | ==== Management of the whole cluster ==== | ||
*[[elog]] | *[[elog]] | ||
*[[ShutDownCluster| How to properly switch off the cluster]] | *[[ShutDownCluster| How to properly switch off the cluster]] | ||
*[[PutClusterOn| How to properly put the cluster on]] | *[[PutClusterOn| How to properly put the cluster on]] | ||
==== CMS Services ==== | ==== CMS Services ==== | ||
*[[OpenID | How to use tokens and openID in the grid]] | |||
*[[Phedex]] | *[[Phedex]] | ||
*[[Heartbeat]] | *[[Heartbeat]] | ||
Line 10: | Line 11: | ||
*[[FroNTier]] | *[[FroNTier]] | ||
*[[ProdAgent]] | *[[ProdAgent]] | ||
*[[GitForSiteConf| | *[[GitForSiteConf| Instructions to commit siteconf to git]] | ||
==== Grid Configuration Issues ==== | ==== Grid Configuration Issues ==== | ||
*[[UpdateCertificates| Update the certificates of all our machines]] | *[[UpdateCertificates| Update the certificates of all our machines]] | ||
*[[CreamIssues| Issues with cream and how to solve them]] | *[[CreamIssues| Issues with cream and how to solve them]] | ||
*[[PBS_TMPDIR| PBS TMPDIR]] | *[[PBS_TMPDIR| PBS TMPDIR]] | ||
*[[APEL]] | *[[APEL| <strike>APEL</strike>(OBSOLETE)]] | ||
*[[BDII]] | *[[BDII]] | ||
*[[FTS]] | *[[FTS]] | ||
*[[SL4_x86_64_WNs| SL4 x86_64 WNs]] | *[[SL4_x86_64_WNs| <strike>SL4 x86_64 WNs</strike>(OBSOLETE)]] | ||
*[[CE_oveloaded| CE overloaded]] | *[[CE_oveloaded| CE overloaded]] | ||
*[[RB]] | *[[RB]] | ||
*[[IPMI]] | *[[IPMI]] | ||
*[[CA_certificates| Upgrade CA certificates]] | *[[CA_certificates| <strike>Upgrade CA certificates</strike> (OBSOLETE)]] | ||
*[[Shutdown| Shutting down the cluster]] | *[[Shutdown| Shutting down the cluster]] | ||
*[[Software_Area_Switch| Software Area Switch]] | *[[Software_Area_Switch| Software Area Switch]] | ||
Line 28: | Line 30: | ||
*[[Argus| Argus server and glexec on the workernodes]] | *[[Argus| Argus server and glexec on the workernodes]] | ||
*[[ApelGapPublishing| Apel gap publishing]] | *[[ApelGapPublishing| Apel gap publishing]] | ||
*[[UpdateCACertificates| Update IGTF CA certificates]] | |||
==== Files section ==== | ==== Files section ==== | ||
*[[DCache| dCache]] | *[[DCache| dCache]] | ||
Line 39: | Line 43: | ||
*[[GetLostFiles| Retrieve lost files from datasets]] | *[[GetLostFiles| Retrieve lost files from datasets]] | ||
*[[StorageConsistency| Storage Consistency]] | *[[StorageConsistency| Storage Consistency]] | ||
*[[Rucio | rucio commands ]] | |||
==== Status and Monitoring ==== | ==== Status and Monitoring ==== | ||
Line 61: | Line 66: | ||
*[[NetworkSetup| Network Setup]] | *[[NetworkSetup| Network Setup]] | ||
*[[SetupMonitoringControlerSunfireV20z| Setup Monitoring of LSI Disk Controler on Sunfire V20z Server]] | *[[SetupMonitoringControlerSunfireV20z| Setup Monitoring of LSI Disk Controler on Sunfire V20z Server]] | ||
*[[LDAP_UCL_IIHE| LDAP authentication system for the replication between UCL and IIHE sites]] | *[[LDAP_UCL_IIHE| <strike>LDAP authentication system for the replication between UCL and IIHE sites</strike> (OBSOLETE)]] | ||
*[[GridAdminSurvivalGuide| IIHE Grid-admin survival guide]] | *[[GridAdminSurvivalGuide| IIHE Grid-admin survival guide]] | ||
*[[Solaris| Solaris 10]] | *[[Solaris| Solaris 10]] | ||
*[[SolarisSSD| Adding an SSD card and configuring RAID, zpools, filesystems and shares on the new Solaris fileserver]] | *[[SolarisSSD| Adding an SSD card and configuring RAID, zpools, filesystems and shares on the new Solaris fileserver]] | ||
*[[LinuxAdminTricks| Linux tricks for admins]] | *[[LinuxAdminTricks| Linux tricks for admins]] | ||
*[[CrabLocalPbsSubmission| How to implement local PBS submission with CRAB ?]] | *[[CrabLocalPbsSubmission| <strike>How to implement local PBS submission with CRAB ?</strike>(OBSOLETE)]] | ||
*[[AddNewUserFromUCLToLDAP| How to create an account for a CMS user from UCL ?]] | *[[AddNewUserFromUCLToLDAP| <strike>How to create an account for a CMS user from UCL ?</strike>(OBSOLETE)]] | ||
*[[OSErrata| Deploying OS errata]] | *[[OSErrata| <strike>Deploying OS errata</strike>(OBSOLETE)]] | ||
*[[BenchmarkHEPSPEC06| Howto benchmark a node with HEPSPEC06]] | *[[BenchmarkHEPSPEC06| Howto benchmark a node with HEPSPEC06]] | ||
*[[Installing_dcache_pool| Install a new | *[[Installing_dcache_pool| Install a new dCache pool]] | ||
*[[BackupUsersHomeDirs| Backup of the users home dirs on Jefke]] | *[[BackupUsersHomeDirs| Backup of the users home dirs on Jefke]] | ||
*[[MonWebServicesMigration| Migration of mon and its Web services]] | *[[MonWebServicesMigration| Migration of mon and its Web services]] | ||
Line 79: | Line 84: | ||
**[[KernelUpdate| Reboot after a kernel update]] | **[[KernelUpdate| Reboot after a kernel update]] | ||
**[[UpgradeWNstoSL5.5| Reboot after an OS upgrade]] | **[[UpgradeWNstoSL5.5| Reboot after an OS upgrade]] | ||
*[[ | ** Force reboot a WN with hanging nfs: echo 1 > /proc/sys/kernel/sysrq ;echo b > /proc/sysrq-trigger | ||
** Force shutdown a WN with hanging nfs: echo 1 > /proc/sys/kernel/sysrq ;echo o > /proc/sysrq-trigger | |||
*[[ManageAllAdminScriptsWithGit| Central management of all the admin scripts with Git]] | |||
*[[HelpPageForAllScripts|Help page for all iihe scripts]] | |||
*[[ConfigProxyCvmfs| Configuration of a proxy for CVMFS]] | *[[ConfigProxyCvmfs| Configuration of a proxy for CVMFS]] | ||
**[[RecoverCvmfs| How to recover CVMFS]] | **[[RecoverCvmfs| How to recover CVMFS]] | ||
Line 90: | Line 98: | ||
*[[CCMWithKerberos| Experimental : Securing profiles with Kerberos]] | *[[CCMWithKerberos| Experimental : Securing profiles with Kerberos]] | ||
*[[MigrateToMediaWiki| Migration of T2B Wiki from Trac to MediaWiki]] | *[[MigrateToMediaWiki| Migration of T2B Wiki from Trac to MediaWiki]] | ||
*[[motd|Message Of The Day (motd)]] | |||
*[[LToS| Support of Long-tail of Science]] | |||
*[[QueryingBDII| Querying BDII]] | |||
*[[MachinePrivateCertWithEL7| Machine private certificate with EL7]] | |||
*[[ClusterUsageAccountingStatistics| Cluster usage accounting statistics]] | |||
*[[SingularityContainerCreation | Singularity container creation]] | |||
*[[ExplainingApel | Explaining the Apel accounting system]] | |||
==== Quattor ==== | ==== Quattor ==== | ||
*[[ | *[[AideMemoire| FAQ - Aide-mémoire - Howtos]] | ||
*[[ | *[[ManageRepositoriesWithQuattor|Manage repositories with quattor]] | ||
*[[GenerateRPMFromATagInGithub| How to build an RPM from a tag in Github]] | *[[GenerateRPMFromATagInGithub| How to build an RPM from a tag in Github]] | ||
*[[WorkingInCB9| Working in CB9 (Quattor release >= 14.2)]] | *[[WorkingInCB9| Working in CB9 (Quattor release >= 14.2)]] | ||
*[[ | *[[AddNewQuattorVersion|How to add a new version of quattor in our scdb]] | ||
*[[QuattorFreeIPA| Quattor and FreeIPA]] | *[[QuattorFreeIPA| Quattor and FreeIPA]] | ||
*[[ | *[[HardDisksManagement| Hard disks management]] | ||
*[[Metaconfig|How to use metaconfig (with examples)]] | |||
*[[Aquilon| Aquilon]] | |||
*[http://quattor.begrid.be/trac/centralised-begrid-v5/wiki/BEgridAndQuattor <strike>BEgrid wiki</strike>(OBSOLETE)] | |||
*[[Test_things| <strike>Test things</strike>(OBSOLETE)]] | |||
*[[Lemon_installation| <strike>Lemon installation</strike>(OBSOLETE)]] | |||
*[[QuattorPointers| <strike>Pointers</strike>]]<strike> to more in-depth information on quattor</strike>(OBSOLETE) | |||
*[[AddingMachineToCluster| <strike>Adding</strike>]]<strike> a new machine to the cluster</strike>(OBSOLETE) | |||
*[[AutomaticMachineTemplateGeneration|<strike>Automatic generation of hardware and profile templates for new workernodes</strike>]](OBSOLETE: use script create_wn) | |||
*[[InstallationBEgridClient0|<strike>Installation of a Quattor deployment server release 13.1</strike>]](OBSOLETE: see quattor template for aii server) | |||
*[[InstallFilesNewOS|<strike> How to add a new OS to the Quattor Repository</strike>]](OBSOLETE) | |||
*[[HowtoMigrateWNToCB9|<strike>How to migrate workernodes from CB8 to CB9</strike>]](HISTORICAL) | |||
*[[BuildANewPysvnOnAiiServer|<strike>Howto build a new pysvn on a SL63 AII server</strike>]](HISTORICAL) | |||
==== FreeIPA ==== | |||
*[[FixIPAcert|Fix IPA client certificates]] | |||
==== KVM virtualization ==== | ==== KVM virtualization ==== | ||
Line 117: | Line 142: | ||
*[[MigrationToOpenNebula| Transforming the KVM hypervisors farm into an OpenNebula cloud]] | *[[MigrationToOpenNebula| Transforming the KVM hypervisors farm into an OpenNebula cloud]] | ||
*[[WorkingInT2BCloud| Working in the T2B cloud]] | *[[WorkingInT2BCloud| Working in the T2B cloud]] | ||
*[[MigrateDBMySQL| Migrate one DB from sqlite to mysql]] | |||
*[[BackupT2BCloud| Backup of the T2B Cloud]] | |||
*[[DealingWithiPXE| Dealing with iPXE]] | |||
*[[ResizingVMDisk| Resizing the drive of a VM]] | |||
*[[RestoringCloudFrontendFromBackup| Restoring an OpenNebula frontend from a backup]] | |||
==== Clouds for users ==== | |||
*[[VUB-ULB cloud]] | |||
*[[BEgrid cloud (part of FedCloud)]] | |||
==== gUSE/WS-PGRADE portal ==== | ==== gUSE/WS-PGRADE portal ==== | ||
*[[PortalInstall| Portal installation]] | *[[PortalInstall| Portal installation]] | ||
*[[PortalConfig| Portal configuration]] | |||
*[[PortalOperations| Portal operations]] | *[[PortalOperations| Portal operations]] | ||
Line 130: | Line 165: | ||
*[[XenQuattor| Xen and Quattor]] | *[[XenQuattor| Xen and Quattor]] | ||
==== CEPH ==== | |||
SEE PRIVATE WIKI | |||
==== CEPH Old (deprecated) ==== | |||
*[[UnderstandingCeph| Understanding Ceph]] | |||
*[[InstallCephWithQuattor| Installing Ceph with Quattor]] | |||
*[[ExperimentsWithCeph| Experiments with Ceph]] | |||
*[[CephBasics| Operating a Ceph cluster]] | |||
*[[Deploying_a_new_Ceph_Octopus_cluster| Deploying a new Ceph Octopus cluster]] | |||
*[[Mounting_a_RBD_on_a_client_machine | Mounting a RBD on a client machine]] | |||
*[[CephCrushMap | Manage the Crush map]] | |||
*[[CephFS | Manage CephFS]] | |||
==== Logstash / Elasticsearch / Kibana (ELK) ==== | |||
machine: log10 | [http://log10.iihe.ac.be/index.html interface] | [http://log10.iihe.ac.be/HQ index manager] | |||
* [[log_forwarding_with_quattor|Forwarding a log with rsyslog to logstash using quattor]] | |||
* [[log_parsing_with_logstash|Parsing the logs with logstash]] | |||
==== Network ==== | |||
* [[network_bond_and_tag|Bonding of 2 interfaces + tagging of 2 vlans on the bond (PRIV+PUB)|]] | |||
* [[huawei_switch|Managing the Huawei CE8850-32CQ-EI 100G switch]] | |||
==== HTCondor clusters ==== | |||
* [[htc_test_local|Testing local submission]] | |||
* [[htc_test_grid|Testing grid submission]] | |||
* [[htc_cheat_sheet|HTCondor cheat sheet]] | |||
* [[htc_python_binding|HTCondor Python binding]] |
Latest revision as of 12:09, 25 July 2024
Management of the whole cluster
CMS Services
- How to use tokens and openID in the grid
- Phedex
- Heartbeat
- LoadTest
- FroNTier
- ProdAgent
- Instructions to commit siteconf to git
Grid Configuration Issues
- Update the certificates of all our machines
- Issues with cream and how to solve them
- PBS TMPDIR
-
APEL(OBSOLETE) - BDII
- FTS
-
SL4 x86_64 WNs(OBSOLETE) - CE overloaded
- RB
- IPMI
-
Upgrade CA certificates(OBSOLETE) - Shutting down the cluster
- Software Area Switch
- Kernel mandatory updates for critical vulnerabilities
- Argus server and glexec on the workernodes
- Apel gap publishing
- Update IGTF CA certificates
Files section
- dCache
- Procedure for removal of old user files on pnfs
- Retrieve lost files from datasets
- Storage Consistency
- rucio commands
Status and Monitoring
- List of reserved WNs
- Todo-list
- Monitoring
- Plans/Schedule
- Grid Troubleshooting link
- Incident Reports
- How to put the software back
- What to do when a WN sends a "bad_wn.pl" email to grid_admin ?
- Nagios Installation at IIHE
- How to restart DCache
Info
- General info
- Installing CMSSW
- Installing CRAB
- System Benchmarks
- T2B Trac config info
- Hardware information
- Network Setup
- Setup Monitoring of LSI Disk Controler on Sunfire V20z Server
-
LDAP authentication system for the replication between UCL and IIHE sites(OBSOLETE) - IIHE Grid-admin survival guide
- Solaris 10
- Adding an SSD card and configuring RAID, zpools, filesystems and shares on the new Solaris fileserver
- Linux tricks for admins
-
How to implement local PBS submission with CRAB ?(OBSOLETE) -
How to create an account for a CMS user from UCL ?(OBSOLETE) -
Deploying OS errata(OBSOLETE) - Howto benchmark a node with HEPSPEC06
- Install a new dCache pool
- Backup of the users home dirs on Jefke
- Migration of mon and its Web services
- HOWTO restart a nagios test manually
- Compile and install ROOT
- Clean creamdb
- Reboot campaign for the workernodes :
- Reboot after a kernel update
- Reboot after an OS upgrade
- Force reboot a WN with hanging nfs: echo 1 > /proc/sys/kernel/sysrq ;echo b > /proc/sysrq-trigger
- Force shutdown a WN with hanging nfs: echo 1 > /proc/sys/kernel/sysrq ;echo o > /proc/sysrq-trigger
- Central management of all the admin scripts with Git
- Help page for all iihe scripts
- Configuration of a proxy for CVMFS
- How to test NFS Performance
- Alternatives to Tetex
- A new easy method to update kernel on the workernodes
- About automatic mail sending from the cluster
- T2B Trac access configuration
- Surviving to RHEL7
- Experimental : Securing profiles with Kerberos
- Migration of T2B Wiki from Trac to MediaWiki
- Message Of The Day (motd)
- Support of Long-tail of Science
- Querying BDII
- Machine private certificate with EL7
- Cluster usage accounting statistics
- Singularity container creation
- Explaining the Apel accounting system
Quattor
- FAQ - Aide-mémoire - Howtos
- Manage repositories with quattor
- How to build an RPM from a tag in Github
- Working in CB9 (Quattor release >= 14.2)
- How to add a new version of quattor in our scdb
- Quattor and FreeIPA
- Hard disks management
- How to use metaconfig (with examples)
- Aquilon
BEgrid wiki(OBSOLETE)-
Test things(OBSOLETE) -
Lemon installation(OBSOLETE) -
Pointersto more in-depth information on quattor(OBSOLETE) -
Addinga new machine to the cluster(OBSOLETE) Automatic generation of hardware and profile templates for new workernodes(OBSOLETE: use script create_wn)Installation of a Quattor deployment server release 13.1(OBSOLETE: see quattor template for aii server)How to add a new OS to the Quattor Repository(OBSOLETE)How to migrate workernodes from CB8 to CB9(HISTORICAL)Howto build a new pysvn on a SL63 AII server(HISTORICAL)
FreeIPA
KVM virtualization
- Virtualization of the new CREAM-CE on dom02 with KVM
- Installation of the new virtualization server dom04
- Easy creation of virtual machines
- Monitoring the KVM vHosts with Ganglia
T2B Cloud
- Transforming the KVM hypervisors farm into an OpenNebula cloud
- Working in the T2B cloud
- Migrate one DB from sqlite to mysql
- Backup of the T2B Cloud
- Dealing with iPXE
- Resizing the drive of a VM
- Restoring an OpenNebula frontend from a backup
Clouds for users
gUSE/WS-PGRADE portal
Migration to EMI-3
XEN
CEPH
SEE PRIVATE WIKI
CEPH Old (deprecated)
- Understanding Ceph
- Installing Ceph with Quattor
- Experiments with Ceph
- Operating a Ceph cluster
- Deploying a new Ceph Octopus cluster
- Mounting a RBD on a client machine
- Manage the Crush map
- Manage CephFS
Logstash / Elasticsearch / Kibana (ELK)
machine: log10 | interface | index manager
Network
- Bonding of 2 interfaces + tagging of 2 vlans on the bond (PRIV+PUB)|
- Managing the Huawei CE8850-32CQ-EI 100G switch