Hello All,
I have a Windows NT Terminal Server 4.0 with Citrix Metaframe version 1.80 (build 663) installed on it. I am having a problem where the machine stops responding to network connections and/or crashes every few days.
When the crashes occur I have checked the event logs. Sometimes (but not always), upon a crash I will get a large series of 2000 and 2019 event errors in the System log (Source: Srv, Description: 2000 - The server's call to a system service failed unexpectedly., 2019 - The server was unable to allocate from the system nonpaged pool because the pool was empty. ) Occasionally a 2021 Srv error (unable to find a free connection times in the last seconds) occurs as well.
These events brought me to the Microsoft web site, where I found articles 317249, 177415, and 130926. These describe how to detect a nonpaged pool leak. I monitored the server with Perfmon and there *does* appear to be a pool leak-- the nonpaged pool memory increases and does not decrease. However (oddly enough) the thread count (Objects->threads) does not appear to raise with the increase in nonpaged pool.
I found Poolmon.exe on the MS web site, and used it to look at the nonpaged pool tags. After being up for a day or so, the "Even" tag takes the lead in nonpaged pool usage by far (about 7 megabytes right now; the next highest one is "Muta", which uses about 1.5 megs). Looking in the pooltag.txt file, the "Even" tag relates to "event objects".
This being the case, I looked at the Perfmon capture of Objects->Events and it appears to be very high-- far higher than on any of my other servers. I also looked at Objects->Mutexes, and it is also very high. The documentation says that this is an instantaneous variable (not collective), but right now I have 23,832 mutexes and 111,473 events.
This is not (by any means) a heavily used server-- there are only 3 users logged on right now and total CPU (there are 2 CPUs) is peaking at about 25-30%.
I have scanned the hardware both with TuffTest and with Compaq's diagnostic tools (it is a ProLiant 800) and there appear to be no problems with hard disk or memory.
I have also scanned for viruses both with Norton Antivirus and McAfee's Stinger utility, and there don't appear to be any viruses.
I don't think it is a driver issue, as I haven't changed the hardware/drivers any time recently. If it is, where should I look to determine what hardware is causing this?
Can anyone help with this? I am really stuck at this point, and don't know where to look next!
|