Wednesday, July 15, 2009

IT Reliability through Business Transaction Management

...Continued from the last post
Enabling Reliable IT – Managing Performance
How do you know when an end user is experiencing bad response times?
  • They call into the help desk to complain – usually only after a number of past events where they were un-happy with the application’s reliability.
  • An end user monitoring tool measures bad response times
End User Measurements
There are a great number of tools on the market today that perform this task in a variety of ways, below is a summary of the different approaches.
Software Based Real User Measurements
Desktop Agent
  • Strength – enables the monitoring of the end user’s desktop and can measure response times for fat client based applications
  • Weakness – must be installed at each desktop
Javascript Injection
  • Strength – no need for end user installation
  • Weakness – Javascript needs to be added to web application code
Browser Plug-In
  • Strength – easy installation without code modification
  • Weakness – still requires end user installation
Network Appliance Based Real User Measurements
All of these solutions utilize a network sniffer installed at a port mirror in order to guess end user response times. The advantage of using this solution is the ease of installation; the disadvantage is the cost of putting these probes at all of the points where the network is accessed and the
accuracy of the data.
Synthetic End User Measurements Performed by “Robots”
The classic availability monitors use this approach - scripts are used to “ping” the system and check its availability. The advantage is that availability can be monitored overnight and before the morning workload, the disadvantage is that real user response times are not being measured and scripts have to be modified with every change of the application.
Finding the Location of the Problem
Now that you know that there is a problem - since the end user is experiencing reliability issues with an application, narrowing down the location of the problem is the next step. Research has shown that 80% of time spent on troubleshooting performance problems is spent on finding the actual location of the problem.

In the picture below “John the User” is experiencing poor response times – the IT department is tasked with resolving this issue.
To be continued next week...

Sunday, July 5, 2009

IT Reliability through Business Transaction Management

I just got back from a regional CMG conference where the majority of attendees identify themselves as IT performance and capacity professionals. Ultimately the objective of their work is to make the organization’s IT systems more reliable.
  • By planning for tomorrow’s capacity they ensure that the applications that our co-workers and customers rely on will continue to be reliable throughout increased usage and new changes.
  • By managing IT performance they enable the day to day reliability of these applications so that revenue can be generated and growth achieved.
By focusing our efforts on the greater goal of IT reliability we are able to heal the great disconnect that happens much too often between the business and IT. Showing the business graphs of increased CPU consumption or bandwidth utilization does not enable them to relate to the performance and capacity issues that they are concerned with. IT resource consumption metrics may be the “vital signs” of IT systems, but this is kind of like checking a patient’s pulse and respiratory rate and saying that they are healthy enough even though the patient may be suffering from a terminal disease. The lights may be on, but are the IT systems you manage perceived as reliable by the end users and the business?

Put yourself in the shoes of the end user; you want any action that you perform within an application to have a quick and valid response. That is reliability in the eyes of the business and the end user and that is exactly what your work as a performance or capacity professional is seeking to enable.

What connects between all of the various parts of the puzzle? What links the business, the end users, the network, firewalls, proxy servers, web servers, application servers, load balancers, message brokers, databases, and mainframes? Transactions.
When posing the question "how do YOU define a transaction?" to a room full of IT professionals you are likely to receive a handful of different answers.
Typically one would define an HTTP request, database call, CICS transaction or SOAP request as a transaction. In this paper these are defined as transaction segments that are part of a greater transaction.
A transaction is the most elementary unit of work that can be performed by a user of an application. Whenever a user clicks a button within an application, they have performed a “transaction activation”. This “transaction activation” can trigger any number of IT processes within the datacenter that are used to get the work done. A transaction could be a transfer of funds, a purchase, an update of information, or the opening of a new account – any user interaction with the application – also known as a business service or unit of work.

To be continued next week...