ExplainingApel: Difference between revisions

From T2B Wiki
Jump to navigation Jump to search
No edit summary
No edit summary
Line 11: Line 11:
The blah files contain information provided by the CE layer (like the user DN for example), while the batch files contain low level pieces of information coming form the underlying batch system (that can be SLURM, or LSF, or PBS or HTCondor or ...).
The blah files contain information provided by the CE layer (like the user DN for example), while the batch files contain low level pieces of information coming form the underlying batch system (that can be SLURM, or LSF, or PBS or HTCondor or ...).


After the blah and batch files have been generated, the script condor_ce_apel.sh will call '''/usr/bin/apelparser''', a Python script whose role is to update the local Apel Mysql database with the content of the blah and batch files. One of the main task of this script is make a join of blah and batch records.
After the blah and batch files have been generated, the script condor_ce_apel.sh will call '''/usr/bin/apelparser''', a Python script whose role is to update the local Apel Mysql database with the content of the blah and batch files.


= The Apel database =
= The Apel database =
In our case, it's a MySQL database whose name is '''apelclient'''. It's created and intialized automatically when the CE is deployed thanks to Puppet. It contains table like '''BlahdRecords''' to store the records from blah files, '''EventRecords''' to store records from batch files, and '''JobRecords''' to store the records that are generated by the join performed by the '''aperparser''' script. In this database, you'll also find views (their names begins with a 'V') that give a fast access to data. Many operations on these tables are actually performed through MySQL procedures.
In our case, it's a MySQL database whose name is '''apelclient'''. It's created and intialized automatically when the CE is deployed thanks to Puppet. It contains table like '''BlahdRecords''' to store the records from blah files, '''EventRecords''' to store records from batch files, and '''JobRecords''' to store the records that are generated by the join performed by the '''apelclient''' script (see next section). In this database, you'll also find views (their names begins with a 'V') that give a fast access to data. Many operations on these tables are actually performed through MySQL procedures.


A very important table for sysadmins is the '''ProcessedFiles'''. It is used by the apelparser script to check if a file has already been processed or not. When the processing of a file has failed, it will be recorded with its '''Parsed''' field set to 0. Otherwise, this field indicates the number of records processed in the file.
A very important table for sysadmins is the '''ProcessedFiles'''. It is used by the apelparser script to check if a file has already been processed or not. When the processing of a file has failed, it will be recorded with its '''Parsed''' field set to 0. Otherwise, this field indicates the number of records processed in the file.
= Generation of job and summary records =
Once every day, job records will be generated by the '''apelclient''' Python script. To be more precise, this script accomplish the following tasks:
* fetch benchmark information from LDAP database;
* join EventRecords and BlahdRecords into JobRecords;
* summarise jobs
* unload JobRecords or SummaryRecords into filesystem
= Sending the job and summary records to the remote accounting server =
During this step, the records that have been previously unloaded to the filesystem will be sent by '''SSM''' to a remove accounting server. This step can be performed by the '''apelclient''' script if the variable '''enabled''' from section '''SSM''' of the config file '''/etc/apel/client.cfg''' is set to '''true'''.
The script that is run to send these records is '''/usr/bin/ssmsend'''. It is configured by file '''/etc/apel/sender.cfg'''. The protocol we use to send the records is called '''AMS''' (instead of '''STOMP''').
= Configuration files =
They are located in '''/etc/apel'''. Here are the config files relevant to us:
* '''/etc/apel/client.cfg'''
* '''/etc/apel/parser.cfg'''
* '''/etc/apel/sender.cfg'''.

Revision as of 15:34, 16 July 2024

Context

Our CE is an HTCondor CE, and the underlying batch system is htcondor.

These CEs are equipped with an Apel software stack. The role of this machinery is to extract information about jobs from logfiles, and feed a local database with them. Everyday, the database is read, records are extracted from it and sent to a remote Apel accounting server.

From HTCondor job history files to batch and blah files

Each time a job is finished, a job record is created in the directory /var/lib/condor/history.

Thanks to a systemd timer (see /usr/lib/systemd/system/condor-ce-apel.timer), every hour, a script (/usr/share/condor-ce/condor_ce_apel.sh) is run to parse these history records and to generate the blah and batch files from them via the script /usr/share/condor-ce/condor_batch_blah.py. The blah and batch files are created in the directory /var/lib/condor-ce/apel. If for some reason, it fails to parse an history file, this file is moved to the subdirectory quarantine. Otherwise, it is removed.

The blah files contain information provided by the CE layer (like the user DN for example), while the batch files contain low level pieces of information coming form the underlying batch system (that can be SLURM, or LSF, or PBS or HTCondor or ...).

After the blah and batch files have been generated, the script condor_ce_apel.sh will call /usr/bin/apelparser, a Python script whose role is to update the local Apel Mysql database with the content of the blah and batch files.

The Apel database

In our case, it's a MySQL database whose name is apelclient. It's created and intialized automatically when the CE is deployed thanks to Puppet. It contains table like BlahdRecords to store the records from blah files, EventRecords to store records from batch files, and JobRecords to store the records that are generated by the join performed by the apelclient script (see next section). In this database, you'll also find views (their names begins with a 'V') that give a fast access to data. Many operations on these tables are actually performed through MySQL procedures.

A very important table for sysadmins is the ProcessedFiles. It is used by the apelparser script to check if a file has already been processed or not. When the processing of a file has failed, it will be recorded with its Parsed field set to 0. Otherwise, this field indicates the number of records processed in the file.

Generation of job and summary records

Once every day, job records will be generated by the apelclient Python script. To be more precise, this script accomplish the following tasks:

  • fetch benchmark information from LDAP database;
  • join EventRecords and BlahdRecords into JobRecords;
  • summarise jobs
  • unload JobRecords or SummaryRecords into filesystem

Sending the job and summary records to the remote accounting server

During this step, the records that have been previously unloaded to the filesystem will be sent by SSM to a remove accounting server. This step can be performed by the apelclient script if the variable enabled from section SSM of the config file /etc/apel/client.cfg is set to true.

The script that is run to send these records is /usr/bin/ssmsend. It is configured by file /etc/apel/sender.cfg. The protocol we use to send the records is called AMS (instead of STOMP).

Configuration files

They are located in /etc/apel. Here are the config files relevant to us:

  • /etc/apel/client.cfg
  • /etc/apel/parser.cfg
  • /etc/apel/sender.cfg.