Using htmap

From T2B Wiki
Revision as of 10:41, 16 February 2022 by Admin (talk | contribs)
Jump to navigation Jump to search

What is htmap?

It's a Python library that allows you to map some function calls out to an HTCondor cluster. It's really easy to use and it brings the computing power of our cluster into your Python codes.

You'll find a more detailed presentation here.

How to use htmap with our HTCondor cluster?

There are many ways to use htmap, and we haven't tested all of them. This small tutorial will demonstrate how to use htmap in a shared mode. (By "shared", it's meant that it exploits the fact that your user home directory is also available on the workernodes.) Let's start!

1. Login to a m-machine.

2. Set up a Python 3 virtual environment:

$ mkdir python-envs
$ cd python-envs
$ python3 -m venv htmap_env
$ source htmap_env/bin/activate

3. In this new environment, you can install the htmap library:

$ python -m pip install htmap

4. Now we'll write a simple script to test htmap. Here is the content:

#!/usr/bin/env python

import sys
import htmap
from htmap import names

def _get_base_descriptors_for_shared(
    tag: str,
    map_dir
):
    return {
        'universe': 'vanilla',
        'executable': sys.executable,
        'transfer_executable': 'False',
        'arguments': f'{names.RUN_SCRIPT} $(component)',
        'transfer_input_files': [
            (map_dir / names.RUN_SCRIPT).as_posix(),
        ],
    }

htmap.register_delivery_method(
    'shared',
    descriptors_func = _get_base_descriptors_for_shared,
)

htmap.settings["DELIVERY_METHOD"] = 'shared'

m = htmap.map(str, range(5))
print(list(m))

5. Now you can run this script and wait...

$ ./test_htmap.py
['0', '1', '2', '3', '4']

6. To get out of the Python virtual environment when you are done:

$ deactivate

Tips and tricks

Using a second shell session with the same virtual environment as before enabled, you can monitor the progress of your htmap script with the following command:

htmap status --live