Weird Siteminder Policy Server Issue!!

There was an issue with the test Siteminder environment since the day I came back (am I cursed??).  The problem happens intermediately.  Siteminder Policy Server has the following recorded:

[3804/644][Tue Mar 01 2011 17:45:44][CServer.cpp:1395][ERROR] Bad security handshake attempt. Handshake error: 3152
[3804/644][Tue Mar 01 2011 17:45:44][CServer.cpp:1402][ERROR] Handshake error: Failed to receive client hello. Socket error 0

CA knowledge base suggests it may be a shared secret rollover out of sync and the solution was to reregister the Sitemnder IIS Web Agent trusted host object.  Simple enough, done that, but that did not solve the issue.  The next step of troubleshooting is to find out if there is anything changed on the server level before the Monday.  e.g. OS patching, IIS config change, performance bottleneck or disk space issue.  I have checked the obvious but problem persist. 

The test Siteminder environment is used by the testing of the Identity Manager, and a few multi millions project applications.  The project managers are panic, the solution architects are worried, and I am stressed.  Since then I have rebuild the IIS server (without the last 6 months security patches and hotfixes) and relocate only the major applications to the newly rebuild server.  SM Policy Server ad We Agent installed locally.  No luck.  I then installed a copy of the eDirectory and use it as the Siteminder Policy Store locally.  At this stage, all of the Siteminder components are isolated and reside on the new server.  Still… no luck.  I are running out of ideas.  The application developers then started to hassle me for the obvious things that I have checked in day one taking over the problem, which is quite annoying.   The last resort is to enable the Siteminder traces and lodge a support call to CA for assistance.

It took CA 3 days to get back to us requesting for more information.  We updated the trace log format and resubmitted the logs.  They got back to us today.  The trace indicate that the policy server TCP/IP sockets are used up due to numerous connections to one of the user directories.  They were eventually timeout but being held up for no reason.  CA support suggest us to look at the performance of the user directories.

The user directories are NLB enabled with Cisco ACE module frontend.  We still can’t determine the cause of the socket error.  The performance of the user directories seems ok.  What the heck, it’s a test environment, so I decided to arrange a restart after hours.  It is now restarted and the above errors have not reoccurred since then.  I will see what the developers say tomorrow.

2 thoughts on “Weird Siteminder Policy Server Issue!!”

  1. Hi Fen

    Finally did you find the root cause for this issue?

    Did you restart the Siteminder server or Directory server?

    We have a similar problem right now.

    Regards

    1. Hey, it has been a while since I encountered this issue so I am commenting this based on my memory. The issue was related to network time-out causes by the firewall between the SMPS and the user store. Funny enough this can’t be replicated in production via the same network infrastructure. The Siteminder environment was since upgraded to v12 and we didn’t see this issue since then.

      Can’t be much help unfortunately.

Leave a Reply

Your email address will not be published. Required fields are marked *