ExplainingApel: Difference between revisions

From T2B Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 9: Line 9:
Thanks to a systemd timer (see '''/usr/lib/systemd/system/condor-ce-apel.timer'''), every hour, a script ('''/usr/share/condor-ce/condor_ce_apel.sh''') is run to parse these history records and to generate the blah and batch files from them via the script '''/usr/share/condor-ce/condor_batch_blah.py'''. The blah and batch files are created in the directory '''/var/lib/condor-ce/apel'''. If for some reason, it fails to parse an history file, this file is moved to the subdirectory '''quarantine'''. Otherwise, it is removed.
Thanks to a systemd timer (see '''/usr/lib/systemd/system/condor-ce-apel.timer'''), every hour, a script ('''/usr/share/condor-ce/condor_ce_apel.sh''') is run to parse these history records and to generate the blah and batch files from them via the script '''/usr/share/condor-ce/condor_batch_blah.py'''. The blah and batch files are created in the directory '''/var/lib/condor-ce/apel'''. If for some reason, it fails to parse an history file, this file is moved to the subdirectory '''quarantine'''. Otherwise, it is removed.


After the blah and batch files have been generated, the script condor_ce_apel.sh will call '''/usr/bin/apelparser''', a Python script whose role is to update the local Apel Mysql database with the content of the blah and batch files.
The blah files contain information provided by the CE layer (like the user DN for example), while the batch files contain low level pieces of information coming form the underlying batch system (that can be SLURM, or LSF, or PBS or HTCondor or ...).
 
After the blah and batch files have been generated, the script condor_ce_apel.sh will call '''/usr/bin/apelparser''', a Python script whose role is to update the local Apel Mysql database with the content of the blah and batch files. One of the main task of this script is make a join of blah and batch records.
 
= The Apel database =
In our case, it's a MySQL database whose name is '''apelclient'''. It's created and intialized automatically when the CE is deployed thanks to Puppet. It contains table like '''BlahdRecords''' to store the records from blah files, '''EventRecords''' to store records from batch files, and '''JobRecords''' to store the records that are generated by the join performed by the '''aperparser''' script. In this database, you'll also find views (their names begins with a 'V') that give a fast access to data. Many operations on these tables are actually performed through MySQL procedures.
 
A very important table for sysadmins is the '''ProcessedFiles'''. It is used by the apelparser script to check if a file has already been processed or not. When the processing of a file has failed, it will be recorded with its '''Parsed''' field set to 0. Otherwise, this field indicates the number of records processed in the file.

Revision as of 15:08, 16 July 2024

Context

Our CE is an HTCondor CE, and the underlying batch system is htcondor.

These CEs are equipped with an Apel software stack. The role of this machinery is to extract information about jobs from logfiles, and feed a local database with them. Everyday, the database is read, records are extracted from it and sent to a remote Apel accounting server.

From HTCondor job history files to batch and blah files

Each time a job is finished, a job record is created in the directory /var/lib/condor/history.

Thanks to a systemd timer (see /usr/lib/systemd/system/condor-ce-apel.timer), every hour, a script (/usr/share/condor-ce/condor_ce_apel.sh) is run to parse these history records and to generate the blah and batch files from them via the script /usr/share/condor-ce/condor_batch_blah.py. The blah and batch files are created in the directory /var/lib/condor-ce/apel. If for some reason, it fails to parse an history file, this file is moved to the subdirectory quarantine. Otherwise, it is removed.

The blah files contain information provided by the CE layer (like the user DN for example), while the batch files contain low level pieces of information coming form the underlying batch system (that can be SLURM, or LSF, or PBS or HTCondor or ...).

After the blah and batch files have been generated, the script condor_ce_apel.sh will call /usr/bin/apelparser, a Python script whose role is to update the local Apel Mysql database with the content of the blah and batch files. One of the main task of this script is make a join of blah and batch records.

The Apel database

In our case, it's a MySQL database whose name is apelclient. It's created and intialized automatically when the CE is deployed thanks to Puppet. It contains table like BlahdRecords to store the records from blah files, EventRecords to store records from batch files, and JobRecords to store the records that are generated by the join performed by the aperparser script. In this database, you'll also find views (their names begins with a 'V') that give a fast access to data. Many operations on these tables are actually performed through MySQL procedures.

A very important table for sysadmins is the ProcessedFiles. It is used by the apelparser script to check if a file has already been processed or not. When the processing of a file has failed, it will be recorded with its Parsed field set to 0. Otherwise, this field indicates the number of records processed in the file.