One thing I came to appreciate from my trip to DC is that the government activities here function more or less like a large data "switch."
Data flows in from all over the world and beyond, and is processed, summarized and interpreted. The processed data (or information) is then disseminated back to the field in the form of laws, regulations, policies or more importantly money so that action can be taken.
The government in DC does not take action itself. The government apparatuses in the field do the work as it should be. Yet, this machine, which is devoid of human intellect in order to minimize accompanying human weaknesses like partiality or common sense, does break down.
Take for example the Katrina disaster. The House of Representatives report states that the tepid Federal government response was due to the Homeland Security Council’s (HSC) "failure to resolve conflicts in information and the 'fog of war,' not a lack of information." It added that the crisis showed the government remains "woefully incapable" of managing information.
Since this is not a political blog, I would like to divert my writing from the occasional cataclysm to the daily finger-in-the-dyke facing IT personnel. A Web application might not have collapsed in a total outage, but you certainly know that getting complaints about performance, broken links, or errors is quite common.
Usually complaints, especially those emanating from the higher reaches of the organizational hierarchy, are decidedly information content-free like: "How come the application was slow yesterday morning?"
While the politically correct response should be: "Sorry, I’ll go look into it;" the realistic respond should be: "I don’t know what you or the thousands of moving parts in the application infrastructure are doing, so realistically, I can’t help beyond giving you lip service."
However, if your best-of-breed Web APM tool has the requisite characteristics noted in my previous post, you don’t have to perform lip service, but can actually revisit the situation from yesterday morning, determine whether it was a real problem, note which of the moving parts went wrong, and dispatch the right expert to fix the problem.
You certainly don’t have to convene your version of the HSC consisting of your network person, server admin, developer, DBA, application architect, et al to debate, and perhaps, try to recreate in vain the problem. The HSC approach of dealing with problems doesn’t work in the case of Katrina, and will not work for application availability and performance problems.
Besides avoiding the HSC approach, I would also suggest that you avoid the part-of-beast approach (see my earlier post). Since the part-of-beast approach involves complex data flow and a management console designed to be all things to all people, it will only exacerbate confusion and amplify the "fog of war."
Your tool has to provide relevant real-time or historical information, and be designed especially for the workflow relating to managing the availability and performance of Web applications. You certainly don’t want to be trapped like the gentlemen in the picture and get crowded out by useless data delivered at the wrong time for the wrong purpose.