AdminPage: Difference between revisions
Jump to navigation
Jump to search
(22 intermediate revisions by 3 users not shown) | |||
Line 5: | Line 5: | ||
*[[PutClusterOn| How to properly put the cluster on]] | *[[PutClusterOn| How to properly put the cluster on]] | ||
==== CMS Services ==== | ==== CMS Services ==== | ||
*[[OpenID | How to use tokens and openID in the grid]] | |||
*[[Phedex]] | *[[Phedex]] | ||
*[[Heartbeat]] | *[[Heartbeat]] | ||
Line 10: | Line 11: | ||
*[[FroNTier]] | *[[FroNTier]] | ||
*[[ProdAgent]] | *[[ProdAgent]] | ||
*[[GitForSiteConf| | *[[GitForSiteConf| Instructions to commit siteconf to git]] | ||
==== Grid Configuration Issues ==== | ==== Grid Configuration Issues ==== | ||
*[[UpdateCertificates| Update the certificates of all our machines]] | *[[UpdateCertificates| Update the certificates of all our machines]] | ||
Line 41: | Line 43: | ||
*[[GetLostFiles| Retrieve lost files from datasets]] | *[[GetLostFiles| Retrieve lost files from datasets]] | ||
*[[StorageConsistency| Storage Consistency]] | *[[StorageConsistency| Storage Consistency]] | ||
*[[Rucio | rucio commands ]] | |||
==== Status and Monitoring ==== | ==== Status and Monitoring ==== | ||
Line 82: | Line 85: | ||
**[[UpgradeWNstoSL5.5| Reboot after an OS upgrade]] | **[[UpgradeWNstoSL5.5| Reboot after an OS upgrade]] | ||
** Force reboot a WN with hanging nfs: echo 1 > /proc/sys/kernel/sysrq ;echo b > /proc/sysrq-trigger | ** Force reboot a WN with hanging nfs: echo 1 > /proc/sys/kernel/sysrq ;echo b > /proc/sysrq-trigger | ||
** Force shutdown a WN with hanging nfs: echo 1 > /proc/sys/kernel/sysrq ;echo o > /proc/sysrq-trigger | |||
*[[ManageAllAdminScriptsWithGit| Central management of all the admin scripts with Git]] | *[[ManageAllAdminScriptsWithGit| Central management of all the admin scripts with Git]] | ||
*[[HelpPageForAllScripts|Help page for all iihe scripts]] | *[[HelpPageForAllScripts|Help page for all iihe scripts]] | ||
Line 98: | Line 102: | ||
*[[QueryingBDII| Querying BDII]] | *[[QueryingBDII| Querying BDII]] | ||
*[[MachinePrivateCertWithEL7| Machine private certificate with EL7]] | *[[MachinePrivateCertWithEL7| Machine private certificate with EL7]] | ||
*[[ClusterUsageAccountingStatistics| Cluster usage accounting statistics]] | |||
*[[SingularityContainerCreation | Singularity container creation]] | |||
*[[ExplainingApel | Explaining the Apel accounting system]] | |||
==== Quattor ==== | ==== Quattor ==== | ||
Line 138: | Line 145: | ||
*[[BackupT2BCloud| Backup of the T2B Cloud]] | *[[BackupT2BCloud| Backup of the T2B Cloud]] | ||
*[[DealingWithiPXE| Dealing with iPXE]] | *[[DealingWithiPXE| Dealing with iPXE]] | ||
*[[ResizingVMDisk| Resizing the drive of a VM]] | |||
*[[RestoringCloudFrontendFromBackup| Restoring an OpenNebula frontend from a backup]] | |||
==== Clouds for users ==== | ==== Clouds for users ==== | ||
Line 157: | Line 166: | ||
==== CEPH ==== | ==== CEPH ==== | ||
SEE PRIVATE WIKI | |||
==== CEPH Old (deprecated) ==== | |||
*[[UnderstandingCeph| Understanding Ceph]] | *[[UnderstandingCeph| Understanding Ceph]] | ||
*[[InstallCephWithQuattor| Installing Ceph with Quattor]] | *[[InstallCephWithQuattor| Installing Ceph with Quattor]] | ||
*[[ExperimentsWithCeph| Experiments with Ceph]] | *[[ExperimentsWithCeph| Experiments with Ceph]] | ||
*[[CephBasics| Operating a Ceph cluster]] | *[[CephBasics| Operating a Ceph cluster]] | ||
*[[Deploying_a_new_Ceph_Octopus_cluster| Deploying a new Ceph Octopus cluster]] | |||
*[[Mounting_a_RBD_on_a_client_machine | Mounting a RBD on a client machine]] | |||
*[[CephCrushMap | Manage the Crush map]] | |||
*[[CephFS | Manage CephFS]] | |||
==== Logstash / Elasticsearch / Kibana (ELK) ==== | ==== Logstash / Elasticsearch / Kibana (ELK) ==== | ||
Line 171: | Line 187: | ||
* [[network_bond_and_tag|Bonding of 2 interfaces + tagging of 2 vlans on the bond (PRIV+PUB)|]] | * [[network_bond_and_tag|Bonding of 2 interfaces + tagging of 2 vlans on the bond (PRIV+PUB)|]] | ||
* [[huawei_switch|Managing the Huawei CE8850-32CQ-EI 100G switch]] | * [[huawei_switch|Managing the Huawei CE8850-32CQ-EI 100G switch]] | ||
==== HTCondor clusters ==== | |||
* [[htc_test_local|Testing local submission]] | |||
* [[htc_test_grid|Testing grid submission]] | |||
* [[htc_cheat_sheet|HTCondor cheat sheet]] | |||
* [[htc_python_binding|HTCondor Python binding]] |
Latest revision as of 12:09, 25 July 2024
Management of the whole cluster
CMS Services
- How to use tokens and openID in the grid
- Phedex
- Heartbeat
- LoadTest
- FroNTier
- ProdAgent
- Instructions to commit siteconf to git
Grid Configuration Issues
- Update the certificates of all our machines
- Issues with cream and how to solve them
- PBS TMPDIR
-
APEL(OBSOLETE) - BDII
- FTS
-
SL4 x86_64 WNs(OBSOLETE) - CE overloaded
- RB
- IPMI
-
Upgrade CA certificates(OBSOLETE) - Shutting down the cluster
- Software Area Switch
- Kernel mandatory updates for critical vulnerabilities
- Argus server and glexec on the workernodes
- Apel gap publishing
- Update IGTF CA certificates
Files section
- dCache
- Procedure for removal of old user files on pnfs
- Retrieve lost files from datasets
- Storage Consistency
- rucio commands
Status and Monitoring
- List of reserved WNs
- Todo-list
- Monitoring
- Plans/Schedule
- Grid Troubleshooting link
- Incident Reports
- How to put the software back
- What to do when a WN sends a "bad_wn.pl" email to grid_admin ?
- Nagios Installation at IIHE
- How to restart DCache
Info
- General info
- Installing CMSSW
- Installing CRAB
- System Benchmarks
- T2B Trac config info
- Hardware information
- Network Setup
- Setup Monitoring of LSI Disk Controler on Sunfire V20z Server
-
LDAP authentication system for the replication between UCL and IIHE sites(OBSOLETE) - IIHE Grid-admin survival guide
- Solaris 10
- Adding an SSD card and configuring RAID, zpools, filesystems and shares on the new Solaris fileserver
- Linux tricks for admins
-
How to implement local PBS submission with CRAB ?(OBSOLETE) -
How to create an account for a CMS user from UCL ?(OBSOLETE) -
Deploying OS errata(OBSOLETE) - Howto benchmark a node with HEPSPEC06
- Install a new dCache pool
- Backup of the users home dirs on Jefke
- Migration of mon and its Web services
- HOWTO restart a nagios test manually
- Compile and install ROOT
- Clean creamdb
- Reboot campaign for the workernodes :
- Reboot after a kernel update
- Reboot after an OS upgrade
- Force reboot a WN with hanging nfs: echo 1 > /proc/sys/kernel/sysrq ;echo b > /proc/sysrq-trigger
- Force shutdown a WN with hanging nfs: echo 1 > /proc/sys/kernel/sysrq ;echo o > /proc/sysrq-trigger
- Central management of all the admin scripts with Git
- Help page for all iihe scripts
- Configuration of a proxy for CVMFS
- How to test NFS Performance
- Alternatives to Tetex
- A new easy method to update kernel on the workernodes
- About automatic mail sending from the cluster
- T2B Trac access configuration
- Surviving to RHEL7
- Experimental : Securing profiles with Kerberos
- Migration of T2B Wiki from Trac to MediaWiki
- Message Of The Day (motd)
- Support of Long-tail of Science
- Querying BDII
- Machine private certificate with EL7
- Cluster usage accounting statistics
- Singularity container creation
- Explaining the Apel accounting system
Quattor
- FAQ - Aide-mémoire - Howtos
- Manage repositories with quattor
- How to build an RPM from a tag in Github
- Working in CB9 (Quattor release >= 14.2)
- How to add a new version of quattor in our scdb
- Quattor and FreeIPA
- Hard disks management
- How to use metaconfig (with examples)
- Aquilon
BEgrid wiki(OBSOLETE)-
Test things(OBSOLETE) -
Lemon installation(OBSOLETE) -
Pointersto more in-depth information on quattor(OBSOLETE) -
Addinga new machine to the cluster(OBSOLETE) Automatic generation of hardware and profile templates for new workernodes(OBSOLETE: use script create_wn)Installation of a Quattor deployment server release 13.1(OBSOLETE: see quattor template for aii server)How to add a new OS to the Quattor Repository(OBSOLETE)How to migrate workernodes from CB8 to CB9(HISTORICAL)Howto build a new pysvn on a SL63 AII server(HISTORICAL)
FreeIPA
KVM virtualization
- Virtualization of the new CREAM-CE on dom02 with KVM
- Installation of the new virtualization server dom04
- Easy creation of virtual machines
- Monitoring the KVM vHosts with Ganglia
T2B Cloud
- Transforming the KVM hypervisors farm into an OpenNebula cloud
- Working in the T2B cloud
- Migrate one DB from sqlite to mysql
- Backup of the T2B Cloud
- Dealing with iPXE
- Resizing the drive of a VM
- Restoring an OpenNebula frontend from a backup
Clouds for users
gUSE/WS-PGRADE portal
Migration to EMI-3
XEN
CEPH
SEE PRIVATE WIKI
CEPH Old (deprecated)
- Understanding Ceph
- Installing Ceph with Quattor
- Experiments with Ceph
- Operating a Ceph cluster
- Deploying a new Ceph Octopus cluster
- Mounting a RBD on a client machine
- Manage the Crush map
- Manage CephFS
Logstash / Elasticsearch / Kibana (ELK)
machine: log10 | interface | index manager
Network
- Bonding of 2 interfaces + tagging of 2 vlans on the bond (PRIV+PUB)|
- Managing the Huawei CE8850-32CQ-EI 100G switch