On the start of this week (2/1/20) our application in production started having troubles. And we didn’t know why but the IIS process from time to time just decided to quit.
In the EventViewer’s Application log we found:
An unhandled exception occurred and the process was terminated.
Application ID: /LM/W3SVC/1/ROOT/AppPool
Process ID: 1340
Exception: System.OutOfMemoryException
Message: Exception of type 'System.OutOfMemoryException' was thrown.
In the EventViewer’s System log we found:
A process serving application pool 'XAppPool' suffered a fatal communication error with the Windows Process Activation Service. The process id was '1976'. The data field contains the error number.
Something about the system message in TechNet.
Googled the system message and found on StackOverFlow, the advice there was to use IIS Debug Diagnostics tools but we have IIS 7 and this link doesn’t apply to it.
I and my team leader started studying how to read the trace and how to debug the failure.
We thought we found the solution because we found that an unknown someone changed a config value (at the start of the week) that is used in our Silverlight feature layer to check the layer for updates ( see post TODO) was far too low meaning that every second the server got 12*number of users requests that queried the layer status. And even though the server uses caching it was too much.
The WCF trace log has shown the Exception: System.ServiceModel.ProtocolException, System.ServiceModel, Version=3.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089 The number of bytes available is inconsistent with the HTTP Content-Length header. There may have been a network error or the client may be sending invalid requests.
Googled and found this post. Hope it is the Anti Virus but we shall see tomorrow!
Well No AntiVirus was installed, bummer!
Went over the ArcGIS Server logs and found these lines (in C:\Program Files (x86)\ArcGIS\server\user\log):
<Msg time='2011-01-07T22:27:00' type='ERROR' code='10837' target='BaseLayers.MapServer' methodName='MapServer.QueryFeatureData2' machine='liveil-gtm' process='4048' thread='3564'>GeoDatabase Error :FDO error: -2147024890 [ScratchRecordSet]. The table was not found. [GDB_ReleaseInfo]. The table was not found. [GDB_Release]. The table was not found. [GDB_ReleaseInfo]. The table was not found. [GDB_Release]. The table was not found. [GDB_DBTune]. The table was not found. [GDB_ReleaseInfo]. The table was not found. [GDB_Release]. The table was not found. [GDB_ReleaseInfo]. The table was not found. [GDB_Release]. The table was not found. [GDB_DBTune].</Msg>
<Msg time='2011-01-07T22:27:00' type='ERROR' code='100005' target='BaseLayers.MapServer' methodName='MapServer.QueryFeatureData2' machine='liveil-gtm' process='4048' thread='3564' elapsed='0.95282'>Method failed.HRESULT = 0x80004005 : Unspecified error .</Msg>
And this Error in the EventLog:
[15:01:38.201 GisModule] System.Runtime.InteropServices.COMException (0x80041538): Underlying DBMS error[Microsoft OLE DB Provider for SQL Server: Multiple-step OLE DB operation generated errors. Check each OLE DB status value, if available. No work was done.][DATABASE.GIS.LAYER]
at ESRI.ArcGIS.Geodatabase.IFeature.Store()
at Namespace.OnPeriodicUpdateRequest(PeriodicUpdateRequest request) in c:\Folder\App Module GIS\Source\Namespace\Module.OnPeriodicUpdateRequest.cs:line 51
Another thing we found was that the AppPool for our ArcGis server thread was using a lot of CPU (by simply looking at the Task Manager). And that the ArcSoc processes were also very high in CPU usage - ~30%.
At the end my team leader found the problem one of the IFeature.Store failed but an empty row was added to the DB (and since there was no primary key set on the table the next time it happened it again added an empty row). That table grew to a very large number of rows but almost all of them were empty so that in every request to the Server for the Layer (from our Silverlight application) the AppPool for the ArcGis Server worked and it made the ArcSoc processes work as well KILLING the server.
//TODO: post about our implementation of ESRI Silverlight API’s FeatureLayer
//TODO: write about enabling WCF trace log
//TODO: study how to troubleshot this kind of errors and post it here
//TODO: Reformat this post – it’s just erratic
Resources:
Silverlight Forums: HttpRequestTimedOutWithoutDetail
Keywords: fatal communication error, Windows Process Activation Service, IIS, Error, Event Id 5011, WAS, ProtocolException, Anti Virus, IFeature, Store, ArcSoc, AppPool, CPU, COMException