Faq t2b: Difference between revisions
No edit summary |
No edit summary |
||
Line 8: | Line 8: | ||
Add option ' '''-o ServerAliveInterval=100''' ' to your ssh command | Add option ' '''-o ServerAliveInterval=100''' ' to your ssh command | ||
=== Debugging SSH connection to mX machines: === | |||
# Check permissions on ssh keys: | |||
<pre> | |||
> ll $HOME/.ssh | |||
-rw------- 1 rougny rougny 411 avr 29 2019 id_ed25519 | |||
-rw-r--r-- 1 rougny rougny 102 avr 29 2019 id_ed25519.pub | |||
</pre> | |||
: To have the correct permissions: | |||
<pre> | |||
chmod 600 $HOME/.ssh/id_ed25519 | |||
chmod 644 $HOME/.ssh/id_ed25519.pub | |||
</pre> | |||
: 2. If that does not fix it, send us the output of those commands via chat/email: | |||
<pre> | |||
> ll $HOME/.ssh | |||
> date && ssh -vvv MYUSERNAME@m2.iihe.ac.be <-- it needs to be on a specific machines (no mshort/mlong) so that we can read the logs! | |||
</pre> | |||
Line 23: | Line 43: | ||
Note 'nb_core' and 'ppn' must alway be the same value! <br> | Note 'nb_core' and 'ppn' must alway be the same value! <br> | ||
Note also that if you ask for more than one core your time in the queue will probably be longer as the scheduler needs to find the correct amount of free slots on one single machine. We advise against putting this number higher than one unless you really need it for parallel jobs. | Note also that if you ask for more than one core your time in the queue will probably be longer as the scheduler needs to find the correct amount of free slots on one single machine. We advise against putting this number higher than one unless you really need it for parallel jobs. | ||
Revision as of 11:04, 8 December 2021
List of the UIs / mX machines:
- m2 , m3 => 20 minutes of CPU time per process
- m6 , m7 => 1 hour of CPU time per process
Keep ssh connection to UI open:
Add option ' -o ServerAliveInterval=100 ' to your ssh command
Debugging SSH connection to mX machines:
- Check permissions on ssh keys:
> ll $HOME/.ssh -rw------- 1 rougny rougny 411 avr 29 2019 id_ed25519 -rw-r--r-- 1 rougny rougny 102 avr 29 2019 id_ed25519.pub
- To have the correct permissions:
chmod 600 $HOME/.ssh/id_ed25519 chmod 644 $HOME/.ssh/id_ed25519.pub
- 2. If that does not fix it, send us the output of those commands via chat/email:
> ll $HOME/.ssh > date && ssh -vvv MYUSERNAME@m2.iihe.ac.be <-- it needs to be on a specific machines (no mshort/mlong) so that we can read the logs!
MadGraph taking all the cores of a workernode
The default settings for MadGraph is to take all the available cores. This kills the site. If the number of cores used by MadGraph is higher than 1, this needs to be asked to the job scheduler with the following directive added to qsub:
-lnodes=1:ppn=2
Where ppn is the number of cores you request.
To tell MadGraph the number of cores he can take per job, use the following recipe:
./bin/mg5_aMC set nb_core 1 #or 2 or whatever you want save options
Note 'nb_core' and 'ppn' must alway be the same value!
Note also that if you ask for more than one core your time in the queue will probably be longer as the scheduler needs to find the correct amount of free slots on one single machine. We advise against putting this number higher than one unless you really need it for parallel jobs.