ExplainingApel

From T2B Wiki
Revision as of 15:08, 16 July 2024 by Admin (talk | contribs)
Jump to navigation Jump to search

Context

Our CE is an HTCondor CE, and the underlying batch system is htcondor.

These CEs are equipped with an Apel software stack. The role of this machinery is to extract information about jobs from logfiles, and feed a local database with them. Everyday, the database is read, records are extracted from it and sent to a remote Apel accounting server.

From HTCondor job history files to batch and blah files

Each time a job is finished, a job record is created in the directory /var/lib/condor/history.

Thanks to a systemd timer (see /usr/lib/systemd/system/condor-ce-apel.timer), every hour, a script (/usr/share/condor-ce/condor_ce_apel.sh) is run to parse these history records and to generate the blah and batch files from them via the script /usr/share/condor-ce/condor_batch_blah.py. The blah and batch files are created in the directory /var/lib/condor-ce/apel. If for some reason, it fails to parse an history file, this file is moved to the subdirectory quarantine. Otherwise, it is removed.

The blah files contain information provided by the CE layer (like the user DN for example), while the batch files contain low level pieces of information coming form the underlying batch system (that can be SLURM, or LSF, or PBS or HTCondor or ...).

After the blah and batch files have been generated, the script condor_ce_apel.sh will call /usr/bin/apelparser, a Python script whose role is to update the local Apel Mysql database with the content of the blah and batch files. One of the main task of this script is make a join of blah and batch records.

The Apel database

In our case, it's a MySQL database whose name is apelclient. It's created and intialized automatically when the CE is deployed thanks to Puppet. It contains table like BlahdRecords to store the records from blah files, EventRecords to store records from batch files, and JobRecords to store the records that are generated by the join performed by the aperparser script. In this database, you'll also find views (their names begins with a 'V') that give a fast access to data. Many operations on these tables are actually performed through MySQL procedures.

A very important table for sysadmins is the ProcessedFiles. It is used by the apelparser script to check if a file has already been processed or not. When the processing of a file has failed, it will be recorded with its Parsed field set to 0. Otherwise, this field indicates the number of records processed in the file.