Monday, February 8, 2010

Why IT Operations is like an Action TV Series

I like watching the series "24", I can’t really explain why. Every time they nearly get the bad guys something wrong happens, there’s some sort of twist in the plot and they need to start all over again. For example, I’m sure that you are familiar with the following classic scene: The CTU (Counter Terrorism Unit) chopper is following a suspect that is driving a black van. The suspect’s van enters a tunnel, but the van doesn’t leave the tunnel. Instead, a number of different vehicles leave the tunnel at the same time, and the suspect is probably in one of them. By the time they figure out that the black van has been left empty in the tunnel – they have already lost the suspect. They shout “We have lost visual!!!” and are back to looking for the bad guys all over again - then they call Jack Bauer…

IT Operations is just like the CTU; the CTU is responsible for making sure that life goes on without any unpleasant surprises. Similarly, IT Operations needs to do the same in its own space and make sure that the business keeps on running and that business transactions are being executed properly and on time.

When something is about to go wrong, the CTU and IT Operations are expected to prevent it before it affects anyone. So they set up the war room, call everyone in, and start doing their detective work to find the needle in the haystack. If they don’t find it and something goes wrong then the results are significant; either people get hurt (in the CTU's case), or business is impacted.

IT Operations' War Chest

So which tools could IT Operations use to find out that there is a problem, identify the root cause of it and resolve the issue?

For example, IT Operations could use HTTP network appliances that help see every HTTP transaction and measure its response time. These network appliances are just like the CTU's choppers, they do not have adequate visibility into the datacenter. They can indicate that something is wrong with the response time of a transaction, but they cannot show why the response time of the transaction is high and cannot provide the visibility needed for resolution.

IT Operations also uses Event Correlation and Analysis (ECA) tools. ECA tools are like CSI detectives (yes… that’s another one I watch…), and rely on other tools to collect information for them, just like a the CSI detective who collects evidence from a crime scene. ECA tools are just as effective as the products that they rely on to provide them with the data. The issue with ECA tools is that, just like in a crime scene, the thief does not usually leave his ID behind, so all you are left with is just clues, and no accurate data to work with.

Additional tools that IT Ops relys on are; dashboards that monitor server resource consumption, J2EE/.Net tools that are capable of performing drill down diagnostics in application and database layers, synthetic transaction tools and Real User Measurement (RUM) tools. With all of these monitoring tools IT Operations still finds itself in a situation where all lights are green while users are complaining about bad response times. In spite of all of the investment in monitoring tools; the infrastructure that IT Operations is accountable for is still unpredictable. Why?

A Simple Example

Perhaps it’s best to take a look at this classic example: One of our customers had a problem with a wire-transfer transaction. The liability for the problem kept on going back and forth between Operations and Applications, who were pointing fingers at each other as to who was responsible for the issue. “All lights are green” said Operations, “We tested the application and it works just fine” said Applications. Simply put, no existing monitoring tool could point out the problem.

So what was the problem? The answer is simple; it appears that, by design, wire-transfers for over $100K were querying the mainframe nearly 100 times, while other transfers would query it only a couple of times. Same end-user, same application, same transaction, but just a single parameter made the transaction take a whole different path, and made the difference between a 3 second and a 2 minute response time.

What Exactly Are You Monitoring?

Now the question is: why can't existing monitoring tools identify the problem? The reason is simple. Traditional monitoring tools monitor the infrastructure and not the transactions. In a complex heterogeneous infrastructure, there are many tools for monitoring each and every component, but no single spinal cord that is able to show how transactions behave across components. None of the tools are able to deterministically correlate a single request coming into a server with all of the associated requests going out of a server and keep on doing so throughout the Transaction Path. Just like the chopper which could not figure out which of the vehicles coming out of the tunnel contained the suspect who came into the tunnel in the first place.

This situation raises some strategic questions regarding your monitoring approach. How effective is a monitoring framework without that business context? Are you supposed to just to make sure the servers are up and running and applications are responding, or is your real goal is to make sure that the business transactions are being executed as intended and on time?

“In God we trust; all others must bring data”

Applications are tricky, transactions are tricky, and they become even trickier in a complex heterogeneous infrastructure that is composed of multiple platforms, operating systems, application nodes, tiers, databases and where communication between components is in different protocols back and forth for every single click of a button by an end-user.

Only by being able to trace each and every single transaction activation throughout its entire path - 100% of the time, for all transactions, across all components - will you be able to systematically collect necessary granular information in order to get business-contextualized visibility into your datacenter. This kind of visibility is a key factor in being able to identify problems effectively when, or even before they arise.

W. Edwards Deming said; “In God we trust; all others must bring data”. I think he was absolutely right. IT Operations can use choppers, or CSI crime-lab detectives, or Jack Bauers. They all have their roles, but when it comes to fast and effective problem identification as well as many other IT related decision making processes (that’s a whole different article…) real accurate data is required – no partial data, no assumptions.

Business Transaction Management provides you with that data, and by doing so, it provides your IT Organization with visibility and predictability. Wouldn’t it be great if you could go to sleep at night knowing that your infrastructure is reliable? That is, unless you want to play the role of the CTU Director…

The 8th season of 24 will be premiered on January 17th, 2010. 'Till then – why don’t you get yourself a Business Transaction Management solution…

13 comments:

Dave Wallace said...

Well told.

Judy Schramm said...

Who is writing this blog? You need to say who it is. You're doing a great job with content but you lose credibility because there's no info about who you are.

Judy Schramm said...

Actually, I'm guessing it's OpTier. Am I right? (If I am... Hi, Ronit!)

Unknown said...

Provides Smart & Paperless Transaction Management Software for real estate.For more information about transaction management software .

Unknown said...

move4less is a premier supplier of moving services in very less amount of time in las vegas.
For more details at move4less .

Daniel said...

Its very interesting.
Iraqi Dinar

Rajiv Ranjan Kumar said...

This is an amazing post and i have been looking for such an informative blog for a long time. Thank you for writing this.
business coaching Brisbane .

Shane Castane said...

Heya i am for the first time here. I found this board and I find It truly useful & it helped me out a lot. I hope to give something back and aid others like you helped me.

Send flowers to India

ABCBizConsulting said...

That's offering this site? It is advisable to state what individuals it can be. That you are doing congratulations are in order by means of subject material however you remove believability due to there being no information about you.
LLC California

Bryon Curtis said...

I wanted to send you this bit of word so as to give thanks as before with your pleasant strategies you have documented above. It's really shockingly open-handed of you to grant unreservedly what many individuals would have advertised as an electronic book to get some money for themselves, mostly since you could have done it if you ever considered necessary. These smart ideas likewise worked to become a easy way to fully grasp that other people online have a similar dreams just like my own to learn a little more in respect of this matter. I'm certain there are several more pleasurable instances in the future for folks who see your blog post.
Pharmacy Technician License

ellyhower said...

Thank you for your entire work on this web page. Kim really likes conducting research and it's really obvious why. We all notice all about the compelling mode you create good tips and tricks on this blog and increase contribution from some other people on that subject plus our daughter is certainly discovering a lot. Take pleasure in the remaining portion of the year. You are always carrying out a stunning job.
Thigh High Socks

Anonymous said...

Great page. Get in touch if you’re into Durham Campervan Hire

Anonymous said...

Great page. Get in touch if you’re into


Durham Campervan Hire