<?xml version="1.0"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="en-GB">
	<id>https://t2bwiki.iihe.ac.be/index.php?action=history&amp;feed=atom&amp;title=ProdAgentFailures</id>
	<title>ProdAgentFailures - Revision history</title>
	<link rel="self" type="application/atom+xml" href="https://t2bwiki.iihe.ac.be/index.php?action=history&amp;feed=atom&amp;title=ProdAgentFailures"/>
	<link rel="alternate" type="text/html" href="https://t2bwiki.iihe.ac.be/index.php?title=ProdAgentFailures&amp;action=history"/>
	<updated>2026-05-16T09:24:28Z</updated>
	<subtitle>Revision history for this page on the wiki</subtitle>
	<generator>MediaWiki 1.45.3</generator>
	<entry>
		<id>https://t2bwiki.iihe.ac.be/index.php?title=ProdAgentFailures&amp;diff=221&amp;oldid=prev</id>
		<title>Maintenance script: Created page with &quot; === List with some details on failures on how to try to solve them === *VO_CMS_SW_DIR is not set: this environment variable is needed for CMS, so the jobs know where to look...&quot;</title>
		<link rel="alternate" type="text/html" href="https://t2bwiki.iihe.ac.be/index.php?title=ProdAgentFailures&amp;diff=221&amp;oldid=prev"/>
		<updated>2015-08-26T12:28:58Z</updated>

		<summary type="html">&lt;p&gt;Created page with &amp;quot; === List with some details on failures on how to try to solve them === *VO_CMS_SW_DIR is not set: this environment variable is needed for CMS, so the jobs know where to look...&amp;quot;&lt;/p&gt;
&lt;p&gt;&lt;b&gt;New page&lt;/b&gt;&lt;/p&gt;&lt;div&gt;&lt;br /&gt;
=== List with some details on failures on how to try to solve them ===&lt;br /&gt;
*VO_CMS_SW_DIR is not set: this environment variable is needed for CMS, so the jobs know where to look for software. This has to be set on the workernodes.&lt;br /&gt;
**send an email to the admins, if possible also telling on which node this happened. &lt;br /&gt;
**error in stdout&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  ERROR: VO_CMS_SW_DIR is not set&lt;br /&gt;
  prodAgentFailure Invoked with code 10030 &lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
*no site match&lt;br /&gt;
**the resource broker could not find a site matching the job requirements&lt;br /&gt;
**there will be no stdout file from the jobs, but only a file like &amp;lt;tt&amp;gt;JobTracking/Failed/Submission_1/log/edgLoggingInfo.log&amp;lt;/tt&amp;gt; with grid details (like &amp;lt;tt&amp;gt;crab -postMortem&amp;lt;/tt&amp;gt;)&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
   Event: Abort&lt;br /&gt;
   - host                    =    laranja.iihe.ac.be&lt;br /&gt;
   - level                   =    SYSTEM&lt;br /&gt;
   - priority                =    asynchronous&lt;br /&gt;
   - reason                  =    Cannot plan: BrokerHelper: no compatible resources&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
  Look for the job requirement part and try to poll the grid info system to see what is available: eg&lt;br /&gt;
**the edgLoggingInfo.log has a part with the requirements&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  - job            =&lt;br /&gt;
&lt;br /&gt;
       [&lt;br /&gt;
        requirements = ( Member(&amp;quot;VO-cms-CMSSW_1_2_0&amp;quot;,other.GlueHostApplicationSoftwareRunTimeEnvironment) &amp;amp;&amp;amp; anyMatch(other.storage.CloseSEs,( target.GlueSEUniqueID == &amp;quot;polgrid2.in2p3.fr.in2p3.fr&amp;quot; )) ) &amp;amp;&amp;amp; ( other.GlueCEStateStatus == &amp;quot;Production&amp;quot; );&lt;br /&gt;
        RetryCount = 3;&lt;br /&gt;
&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
**the job tries to find a site matching this:&lt;br /&gt;
***is there a site with a closeSE named &amp;#039;&amp;#039;polgrid2.in2p3.fr.in2p3.fr&amp;#039;&amp;#039;?&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  lcg-infosites --vo cms closeSE|grep -C 3 polgrid2.in2p3.fr.in2p3.fr&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
    the result of this query is empty, because there&amp;#039;s very probably a typo in the SE name ;)&lt;br /&gt;
  Other example:&lt;br /&gt;
&amp;lt;pre&amp;gt; &lt;br /&gt;
  requirements = ( Member(&amp;quot;VO-cms-CMSSW_1_2_0&amp;quot;,other.GlueHostApplicationSoftwareRunTimeEnvironment) &amp;amp;&amp;amp; anyMatch(other.storage.CloseSEs,( target.GlueSEUniqueID == &amp;quot;grid11.lal.in2p3.fr&amp;quot; )) ) &amp;amp;&amp;amp; ( other.GlueCEStateStatus == &amp;quot;Production&amp;quot;&lt;br /&gt;
);&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
**check for closeSE, and it returns a matching CE.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  -sh-2.05b$ lcg-infosites --vo cms closeSE|grep -C 3 grid11.lal.in2p3.fr&lt;br /&gt;
        node12.datagrid.cea.fr&lt;br /&gt;
&lt;br /&gt;
  Name of the CE: grid10.lal.in2p3.fr:2119/jobmanager-pbs-cms&lt;br /&gt;
        grid11.lal.in2p3.fr&lt;br /&gt;
        grid05.lal.in2p3.fr&lt;br /&gt;
        grid03.lal.in2p3.fr&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
**is the software available on that CE: the following command shows all available tags at all available sites for CMS. look there for the corresponding CE and see if the software tag is available.&lt;br /&gt;
&amp;lt;pre&amp;gt;&lt;br /&gt;
  lcg-infosites --vo cms tag|less&lt;br /&gt;
  ...&lt;br /&gt;
  Name of the CE: grid10.lal.in2p3.fr&lt;br /&gt;
        VO-cms-CMSSW_0_6_0&lt;br /&gt;
        VO-cms-CMSSW_0_6_1&lt;br /&gt;
        VO-cms-CMSSW_0_7_0&lt;br /&gt;
        VO-cms-CMSSW_0_8_1&lt;br /&gt;
        VO-cms-CMSSW_0_8_3&lt;br /&gt;
        VO-cms-CMSSW_1_0_1&lt;br /&gt;
        VO-cms-CMSSW_1_0_4&lt;br /&gt;
        VO-cms-CMSSW_1_2_0_install-failed-with-25600-on-2007/01/06_15:08:10&lt;br /&gt;
        VO-cms-slc3_ia32_gcc323&lt;br /&gt;
  ...&lt;br /&gt;
&amp;lt;/pre&amp;gt;&lt;br /&gt;
   Apparently the software installation of 120 failed and this is the required version of the job. Contact the admins to ask to take a look at it and/or fix it.&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
&lt;br /&gt;
{{TracNotice|{{PAGENAME}}}}&lt;/div&gt;</summary>
		<author><name>Maintenance script</name></author>
	</entry>
</feed>