Faq t2b: Difference between revisions

From T2B Wiki
Jump to navigation Jump to search
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
<br><br>
<br><br>
=== List of the UIs / mX machines: ===
=== List of the UIs / mX machines: ===
- m0 , m1 , m2 , m3 => 20 minutes of CPU time per process <br>
- mshort: m2 , m3 => 20 minutes of CPU time per process <br>
- m5 , m6 , m7 , m8 , m9 => 5 hours of CPU time per process
- mlong: m0, m1 => 5 hour of CPU time per process
 


=== Keep ssh connection to UI open: ===
=== Keep ssh connection to UI open: ===
Line 9: Line 8:




=== Send a job to the local cluster: ===
=== Debugging SSH connection to mX machines: ===
qsub -q localgrid -o script.stdout -e script.stderr [-l walltime=<HH:MM:SS>] myscript.sh
# Check permissions on ssh keys on your laptop:
<pre>
> ll $HOME/.ssh
-rw------- 1 rougny rougny    411 avr 29  2019 id_ed25519
-rw-r--r-- 1 rougny rougny    102 avr 29  2019 id_ed25519.pub
</pre>
 
: To have the correct permissions:
<pre>
chmod 600 $HOME/.ssh/id_ed25519
chmod 644 $HOME/.ssh/id_ed25519.pub
</pre>


: 2a. If that does not fix it, send us the output of those commands via chat/email, as well as the content of your public key to crosscheck with what is in our system:
<pre>
> ll $HOME/.ssh
> date && ssh -vvv MYUSERNAME@m3.iihe.ac.be    <-- it needs to be on a specific machines (no mshort/mlong) so that we can read the logs!
</pre>
: 2b. Also add your public IPv4 so that we can track your connection in the logs, via visiting for instance https://www.whatismyip.com/
: 2c. Just in case something went wrong: send us your public ssh key (the one ending in .pub!à


=== MadGraph taking all the cores of a workernode ===
=== MadGraph taking all the cores of a workernode ===
The default settings for MadGraph is to take all the available cores. This kills the site.  
The default settings for MadGraph is to take all the available cores. This kills the site.
If the number of cores used by MadGraph is higher than 1, this needs to be asked to the job scheduler with the following directive added to qsub:
 
<pre> -lnodes=1:ppn=2 </pre>
that is why you need to uncomment and set 2 variables in the '''md5_configuration.txt''' file (not the '''dat''' file), '''run_mode''' & '''nb_core'''.<br>
Where ppn is the number of cores you request. <br>
The run mode should be set to 0, single machine, via:
run_mode = 0
 
If the number of cores used by MadGraph is higher than 1, this needs to be asked to the job scheduler with the following directive added to your HTCondor submit file:
<pre> request_cpus = "2" </pre>
 
To tell MadGraph the number of cores he can take per job, use the following recipe:
To tell MadGraph the number of cores he can take per job, use the following recipe:
<pre>
<pre>
Line 24: Line 47:
save options
save options
</pre>
</pre>
Note 'nb_core' and 'ppn' must alway be the same value! <br>
or in the '''md5_configuration.txt''':
Note also that if you ask for more than one core your time in the queue will probably be longer as the scheduler needs to find the correct amount of free slots on one single machine. We advise against putting this number higher than one unless you really need it for parallel jobs.
nb_core = 1
 
Note 'nb_core' and 'request_cpus' must alway be the same value! <br>
=== Access internet '''faster''' from the UIs ===
Note also that if you ask for more than one core your time in the queue will probably be longer as the scheduler needs to find the correct amount of free slots on one single machine. <br>
Since our bandwidth with internet is limited and extremely expensive, you need to use the another one:
We advise against putting this number higher than one unless you really need it for parallel jobs.
* For http/https traffic (uses the university traffic)
export http_proxy=http://qproxy.wn.iihe.ac.be:3128
export https_proxy=http://qproxy.wn.iihe.ac.be:3128
* For ssh traffic, through a server you have access to (example using CERN)
:: edit your '''.ssh/config''' file ===>
<pre>
host github.com
    ProxyCommand ssh MYUSERNAME@lxplus.cern.ch nc github.com 22
    User MYUSERNAME
</pre>

Latest revision as of 14:19, 19 November 2024



List of the UIs / mX machines:

- mshort: m2 , m3 => 20 minutes of CPU time per process
- mlong: m0, m1 => 5 hour of CPU time per process

Keep ssh connection to UI open:

Add option ' -o ServerAliveInterval=100 ' to your ssh command


Debugging SSH connection to mX machines:

  1. Check permissions on ssh keys on your laptop:
> ll $HOME/.ssh
-rw------- 1 rougny rougny    411 avr 29  2019 id_ed25519
-rw-r--r-- 1 rougny rougny    102 avr 29  2019 id_ed25519.pub
To have the correct permissions:
chmod 600 $HOME/.ssh/id_ed25519
chmod 644 $HOME/.ssh/id_ed25519.pub
2a. If that does not fix it, send us the output of those commands via chat/email, as well as the content of your public key to crosscheck with what is in our system:
> ll $HOME/.ssh
> date && ssh -vvv MYUSERNAME@m3.iihe.ac.be     <-- it needs to be on a specific machines (no mshort/mlong) so that we can read the logs!
2b. Also add your public IPv4 so that we can track your connection in the logs, via visiting for instance https://www.whatismyip.com/
2c. Just in case something went wrong: send us your public ssh key (the one ending in .pub!à

MadGraph taking all the cores of a workernode

The default settings for MadGraph is to take all the available cores. This kills the site.

that is why you need to uncomment and set 2 variables in the md5_configuration.txt file (not the dat file), run_mode & nb_core.
The run mode should be set to 0, single machine, via:

run_mode = 0

If the number of cores used by MadGraph is higher than 1, this needs to be asked to the job scheduler with the following directive added to your HTCondor submit file:

 request_cpus = "2" 

To tell MadGraph the number of cores he can take per job, use the following recipe:

./bin/mg5_aMC 
set nb_core 1  #or 2 or whatever you want
save options

or in the md5_configuration.txt:

nb_core = 1

Note 'nb_core' and 'request_cpus' must alway be the same value!
Note also that if you ask for more than one core your time in the queue will probably be longer as the scheduler needs to find the correct amount of free slots on one single machine.
We advise against putting this number higher than one unless you really need it for parallel jobs.