Saturday, November 1, 2008

Transaction Management and “Deep Dive” Java/.NET Profilers

When an Enterprise finally gets the wakeup call when their applications are performing under par they start looking into Transaction Monitoring (or Transaction Tracking/Tracing) solutions. One type of solution is the "Deep Dive" Java/.NET solution which is defined as those solutions that use Bytecode Instrumentations (or Java/.NET Hooks) in order to collect thorough code level
metrics for J2EE/.NET experts. These Application Performance Management solutions are used throughout the entire lifecycle of the product; they are a strong tool for the developer, but a very weak tool for the production environment since they are unable to monitor all of the transactions on all tiers all the time due to very high overhead.

Who Offers These Solutions?

These solutions tend to be offered by the larger corporations:

  • CA Wily – Introscope
  • HP – TransactionVision
  • BMC – Application Problem Resolution (Identify)
  • Dynatrace – PurePath
  • Precise – APM

Overview

These tools provide deep diagnostics into Java/.NET applications – to the code level. They are used by J2EE/.NET experts in order to locate problems before deployment. These solutions are too low level for use by most operations teams and system administrators as they extract a glut of data and do not enable a high level view of the system, on the other hand, application teams rely on them for development, and they can be use in production to a certain extent in order to monitor synthetic transactions or a small percentage of the real transactions that are flowing through the system.

How they Work

Bytecode Instrumentations (or Java hooks) retrieve data from the nodes that are running J2EE/.NET applications. This is done by utilizing the class loading mechanism of the interpreter (JVM for J2EE or CLR for .NET) that in order to intercept specific classes or method calls within the application.

Applications

  • Gives J2EE/.NET experts insight into where the problems are
  • Used mainly in the development phase and pre-deployment
  • Can be used in production for a few percent of the transactions

Advantages

  • Gives developers deep insight into problems at the source code transaction data level
  • With the help of synthetic transactions, deep diagnostics can be performed during production
  • You can get a full method call, similar to a debugger

Drawbacks

  • Lengthy Implementation
  • Only works with certain environments
  • Cannot trace all transactions in real time
  • Not recommended for the production environment
  • Difficult for IT support staff to use
  • It only helps with Java or .NET
  • The solutions are not designed for a high level production view, they do not provide an extensive topology of the system
  • Lots of detailed data is collected. application owners and system administrators do not always know what to do with all of the information

Business Transaction Management

Although the list of drawbacks is long, the objective of this article is not to bash on this kind of solution (really, they will do wonders for your application development team), it is simply to help you understand that if you are looking for a solution that will cover all of your bases during production; these kinds of solutions won't cut it (they provide up to 10% sampling for limited periods of time). These solutions cannot monitor the entire topology of each transaction even though they claim to be end to end, these traits are by design, and no amount of marketing hype will enable these products to solve all of your problems as they claim to do.

11 comments:

William Louth said...

JXInsight offers true resource (database) transaction analysis for Java applications via its JDBInsight technology. Most other products incorrectly label a request processing (entry + exit) pattern as a transaction. Some might argue these are business transactions but most actual business transactions involve multiple client-to-server-to-backend requests so I am not sure whether this holds up in the real-world.

With regard to production overhead we have a Probes technology which out performs the competition by 20x-100x. So I am not sure how they derive their near zero overhead marketing data.

http://blog.jinspired.com/?p=272

http://www.jinspired.com/products/jxinsight/new-in-5.6.html

Kind regards,

William

Alon said...

William, although JXInsight, from what I could tell is more of a deep dive development tool with low overhead (I am curious as to how they achieve this - very impressive), as apposed to a tool that helps link business to IT (like BTM), the website blew me away with its elegant simplicity. All the rest blow their sites up with useless text, Jinspired really does a nice job of paring that down, although it does leave deciphering what exactly the product does to the expert (maybe thats the point)

William Louth said...

Hi,

One of my favorite books was "The Laws of Simplicity" by John Maeda which probably inspired the design.

I do accept that this strive for simplicity (and minimalism) in design can sometimes lose the reader especially as we have hidden all the details of the various technologies within the product in links from the main pages of the site.

We thought it would be best to provide an overview of the technologies in relation to the process and productions (and people) before overloading them with the details and the possibilities.

Actually our product can work at any level of granularity. The design of the probes (our production solution) is completely independent of the underlying execution technology. That said by default it maps Java packages, classes and methods to probe cost centers straight out of the box. But cost centers (named groups) could represent any cost object including a business transaction, IT services, business event. This is why it is ideal for cloud computing were the probes can represent whatever unit of execution represents a chargeable operation.

I am very proud that our product can truly be used across the complete application life cycle as I designed it to reflect the roles I have taken over the course of my career (development, performance testing, and operations). You just need to understand which data collection technology to select depending on the environment and problem context.

Kind regards,

William

bernd said...

First, i'd like to correct that products like dynaTrace and CA Wily are actually running 24x7 in production at low overheads in the range of not measurable to 3%. And its right, there is no free lunch, overhead increases with the granularity of information that is being collected. However, since there is so much value in opening up the black-box, operations is accepting it. While its also true that network sniffer based monitoring solutions biggest benefit is to add zero overhead, they suffer at the same time the limitation to see just input/output of applications and have to treat them as black boxes. Both approaches - the network sniffing and bytecode instrumentation - have their strengths, and sometimes it makes sense to even complement them and use both.

dynaTrace performs Business Transaction Management with its unique PurePaths technology. Automated business transaction mapping is elevating the PurePaths, which are cross tier transactions traces, to the business layer, all in realtime. This provides business service metrics for operations and business employees in a holistic and simplified fashion.

Business Transaction Management with dynaTrace means briefly:
* Always on tracing of each individual transaction 24x7 at acceptable low overhead
* Accurate tracing across multiple physical tiers
* Realtime transaction flow and performance analysis
* Realtime mapping of analysed transactions to the business context, which includes business metrics as well as performance information (e.g. URL based mapping, user mapping based on servlet or ASP session context, transaction type mapping based on method arguments, ...)
* Visualization of business transactions and alerting to violated technical and business constraints
* Automated storing of violating business transactions for automated problem documentation
* Lifecycle enablement, to capture transaction traces in production and enable hand-over to technical personnel to perform root-cause analysis; also to proactively manage performance of transactions in development and testing.
* Integration of additional transactional information from external sources

Check out the two-minute explainer for a quick overview on PurePath technology and Business Transaction Mapping: http://www.dynatrace.com/twominutedemo/

Some further comments:
* "Lengthy Implementation”: I assume your idea is to compare to tapping network cables or plugging inito replication ports, which is indeed fairly easy, granted you have physical access to the environment and that network traffic is not encrypted and uses standardized protocols. In contrast, I would not consider dynaTrace's implementation of deploying a single agent library (even remotely), adding one JVM argument, and using the auto discovey mechanism for sensor placement a lengthy implementation; business transaction mapping may need some hours of configuration, but thats true for network sniffer based approaches too, which have to define rules for picking up session information from the protocol layer wheras dynaTrace is picking it up simply from applications internal session information.

* “Cannot trace in real time”: dynaTrace's PurePath technology is different and does not suffer from that limitation, because its PurePath traces are 24x7 always-on, what is also a key enabler for BTM.

* “Difficult for IT Staff to use”: dynaTrace takes the different users into account, and provides system and business level metrics for triage and operations and deep dive information to architects; However, the trend we are seeing is that performance architects and system architects continuously become a stronger stake on the operations side, pushing for richer information to solve problems not watch them; also what we see is that CTOs and software architects are receiving tremendeous value in gaining visibility in production transactions, by analysing PurePaths that have been captured in production, eliminating the need to trying reproduce problems what often is impossible or impractical.

* “lots of data is collected” : again, key is to provide the right data to the right audience; yes, code level information is not something operators are interested in, but developers save a lot of time getting that information from PurePaths that have been captured in production; and keep in mind, that dynaTrace's approach of capturing each individual transaction provides the core foundation for accurate non-aggregated analysis in multiple dimensions, e.g. precise tier-by-tier performance breakdowns for selected business transactions (vertical slicing).


Regards,
Bernd

mike malzacher said...

Applications
Gives J2EE/.NET experts insight into where the problems are - True
Used mainly in the development phase and pre-deployment – not true. Introscope is used by our customers mainly in production. Many of our customers also use Introscope in Q/A and Test environments.
Can be used in production for a few percent of the transactions – not true. Introscope can see all transactions in real-time in production with minimal overhead.
Advantages
Gives developers deep insight into problems at the source code transaction data level – not true. Introscope is not a “profiler”. It is a production solution and is used by application managers, application server administrators, support personnel, etc. for effective proactive monitoring.
With the help of synthetic transactions, deep diagnostics can be performed during production – true, but if you want to see real-time live transactions, synthetics is not the answer. You need a production-ready solution like CA Wily Introscope.
You can get a full method call, similar to a debugger – you can see method calls, yes.
Drawbacks
Lengthy Implementation – untrue. Introscope installs quickly and can show a user meaningful data very quickly.
Only works with certain environments – untrue. Introscope can run in any OS environment on any application server environment with the same code base. Many of our enterprise customers are using Introscope in production with extremely mixed IT environments.
Cannot trace all transactions in real time – untrue. I answered this comment above.
Not recommended for the production environment – untrue. Introscope is the market leading production solution for .NET and Java environments. CA has over 1100 enterprise customers using Introscope in production. If you don’t believe me, come to CA World in Las Vegas during the week of November 17th and speak to our satisfied customers. Many customers that are attending actually have sessions where they will outline how they are using Introscope in complex production environments.
Difficult for IT support staff to use – untrue.
It only helps with Java or .NET – true, but CA Wily can also provide metrics from non-Java/.NET components.
The solutions are not designed for a high level production view, they do not provide an extensive topology of the system – depends what you mean by topology. Again, CA Wily Introscope is recommended for production and we have many, many customers that are using the solution to be effective.
Lots of detailed data is collected. Application owners and system administrators do not always know what to do with all of the information – not with Introscope. CA Wily provides meaningful data that is correlated.

Mike Malzacher
Product Market Manager
CA, Inc.

www.ca.com/apm

William Louth said...

"First, i'd like to correct that products like dynaTrace and CA Wily are actually running 24x7 in production at low overheads in the range of not measurable to 3%"

That is strange because we have recently had a number of your customers (listed on your website) contacting us with regard to production monitoring via ours probes technology. I recognized the names and asked why if they had already a solution.

So I think you might want to qualify want you mean by 3% overhead as this number is certainly not in the ranged that was mentioned but then again 3% could be possible with an application that was already significantly database (or some external resource) bound.

Before you fire off with an answer you might want to reconsider the proposal we placed to you at JavaOne recently which you turned down. That of standardizing on a benchmark such as SPECjvm2008 for comparison purposes.

By the way can you please stop talking about paths as being unique to your product. JXInsight/JDBInsight has transactional resource access patterns and histories (called paths in the product) years before you even formed your company.

William

Alon said...

Wow! this is some great conversation going on!

First of all, I would like to point out that this article is generalizing, which is always a dangerous thing to do, nobody said that all of the drawbacks apply to 100% of the products.

I must point out that I too have run into people who use Introscope and claim that they cannot monitor 100% of their transactions in production all the time due to overhead. I also understand that introscope cannot give you the full topology of every transaction, meaning; you can see what the transaction is doing at every tier (as far as requests, CPU time, latencies and so on). Maybe they had an older version or something.

As far as Dynatrace is concerned, what kind of metrics can you collect during production at this low overhead, and what tiers can you collect from (is it just data collected on the app server)?

William Louth said...

"I must point out that I too have run into people who use Introscope and claim that they cannot monitor 100% of their transactions in production all the time due to overhead."

This is also my own experience.

We have a number of customers that still use Wily's solution for high level reporting (green/red light dashboard) for less experienced operations staff dealing with incidents with used JXInsight for problem management and capacity planning (and hopefully cloud computing metering).

If a vendor keeps stating single digit runtime overheads without context (environment, application architecture, software execution behavior) then you can be sure they are lying. 3% is completely meaningless and unsubstantiated without context (benchmark?) and shows a lack of respect for the reader as is misleading.

A product could have an enormous overhead but yet still not impact an application if the instrumentation is confined to entry points and the cost of certain operations is horrendous - like a poor performing database, message queue, soap/ws-* stack,......

For this reason we compare products with benchmarks that have extremely high volume operations (+1 billion) at extremely low cost (SPECjvm2008) with instrumentation covering all the code base. We also make sure to max our the processing capability because a number of products attempt to disguise their overhead by offloading onto other threads and processing units (central management server) which in the end creates additional work in capacity planning both for the application and the management server.

William

Chung Wu said...
This comment has been removed by the author.
Chung Wu said...

There are really two parts to the performance overhead question.

1. CPU/memory/disk overhead - i.e. extra capacity utilization from collecting and analyzing transactions
2. Increases in transaction time due to the use of the tool

Both of these need to be low, with #2 being a bigger issue than #1. The last thing you want is to have a performance management tool introduce additional performance problems to the application being monitored.

The performance overhead problem has a psychological component to it also. Traditionally, performance profiling tools introduce a fair amount of overhead, so anytime you talk to people about tool for analyzing transactions, they automatically think of high overhead. Vendors claims of "3% overhead" are always met with skepticism and even cynicism.

In the Java space, one of the main challenges is that byte code instrumentation can introduce a lot of overhead (both #1 and #2) to the application if it is not used carefully. So the "3% overhead" claim can be misleading unless it is qualified with additional information about the data that is gathered and the amount of tweaks that have to be done.

Business Transaction Management: thank you for the kind comment that you left on my blog. Could you put a bit more info on your profile (such as your real name) so that we know who you are?

Chung Wu

William Louth said...

"In the Java space, one of the main challenges is that byte code instrumentation can introduce a lot of overhead (both #1 and #2) to the application if it is not used carefully...."

Just to clarify things it is not necessarily the inserting of bytecode instrumentation but what happens when the instrumentation code is executed. In JXInsight we differentiate the firing or executing of the code from the actual measurement (metering of resources) via patented multi-chained metering strategies which allows us to say claim single digits if not zero overhead but then you might not have any data. This allows us to report accurately and with lower overhead than sampling which is an option touted by some vendors but which fails miserably in practice when one considers the number of threads, the average call depth, and the data required to diagnose problems in Java enterprise applications.

I would also like to point out that in some cases the introduction of a APM solution (with overhead) can in fact appear to reduce the transaction times especially when the JVM is sufferer from large amounts of object allocation of the database is saturate. The overhead reduces the concurrent contention (or workload) on the saturated resource. This is why we measure and compare the instrumentation and measurement overhead for each execution point independently.