The present technology may determine an anomaly in a portion of a distributed business application. Data can automatically be captured and analyzed for the portion of the application associated with the anomaly. By automatically capturing data for just the portion associated with the anomaly, the present technology reduces the resource and time requirements associated with other code-based solutions for monitoring transactions. A method for performing a diagnostic session for a request may begin with initiating collection of diagnostic data associated with a request. An application thread on each of two or more servers may be sampled. The application threads may be associated with the same business transaction and the business transaction may be associated with the request. The diagnostic data may be stored.
|
1. A method for monitoring an application, the method comprising:
monitoring, by an agent on a machine, a distributed business transaction associated with a plurality of distributed applications executing across a computer network, wherein monitoring includes collecting runtime data associated with the distributed business transaction;
based on the runtime data, determining, by the agent, a performance baseline for handling an application request associated with the distributed business transaction;
comparing, by the agent, the runtime data outliers to the performance baseline to identify an anomaly in the distributed business transaction, wherein a number of outliers occurring for a business transaction within a particular time window is compared to a baseline of outlier occurrence for the distributed business transaction, wherein a behavior of the distributed business transaction is learned based on the comparison over time;
upon identifying the anomaly, automatically triggering a diagnostic session at the machine to collect one or more diagnostic parameters about the distributed business transaction based on the learned behavior of the distributed business transaction;
continually updating the performance baseline based on subsequent monitoring of requests associated with the application; and
based on the collected diagnostic parameters, causing, by the agent, a model of a distributed business transaction flow to be generated, wherein a flow map of the one or more distributed business transactions is generated that includes a map of applications or virtual machines that make up the one or more distributed business transactions associated with the diagnostic session triggered by the anomaly.
17. A system for monitoring a distributed application, comprising:
a processor configured to execute a process; and
a memory configured to store program instructions which contain the process executable by the processor, the process configured to:
monitor, by an agent executing on the apparatus, a distributed business transaction associated with a plurality of distributed applications executing across a computer network, wherein monitoring includes collecting runtime data associated with the distributed business transaction;
based on the runtime data, determine a performance baseline for handling an application request associated with the distributed business transaction;
compare the runtime data outliers to the performance baseline to identify an anomaly in the distributed business transaction, wherein a number of outliers occurring for a business transaction within a particular time window is compared to a baseline of outlier occurrence for the distributed business transaction, wherein a behavior of the distributed business transaction is learned based on the comparison over time;
upon identifying the anomaly, automatically trigger a diagnostic session at the machine to collect one or more diagnostic parameters about the distributed business transaction based on the learned behavior of the distributed business transaction;
continually update the performance baseline based on subsequent monitoring of requests associated with the application; and
based on the collected diagnostic parameters, cause a model of the distributed business transaction flow to be generated, wherein a flow map of the one or more distributed business transactions is generated that includes a map of applications or virtual machines that make up the one or more distributed business transactions associated with the diagnostic session triggered by the anomaly.
9. A non-transitory computer readable storage medium having embodied thereon a program, the program being executable by a processor to perform a method for monitoring a garbage collection process, the method comprising:
monitoring, by an agent on a machine, a distributed business transaction associated with a plurality of distributed applications executing across a computer network, wherein monitoring includes collecting runtime data associated with the distributed business transaction;
based on the runtime data, determining, by the agent, a performance baseline for handling an application request associated with the distributed business transaction;
comparing, by the agent, the runtime data outliers to the performance baseline to identify an anomaly in the distributed business transaction, wherein a number of outliers occurring for a business transaction within a particular time window is compared to a baseline of outlier occurrence for the distributed business transaction, wherein a behavior of the distributed business transaction is learned based on the comparison over time;
upon identifying the anomaly, automatically triggering a diagnostic session at the machine to collect one or more diagnostic parameters about the distributed business transaction based on the learned behavior of the distributed business transaction;
continually updating the performance baseline based on subsequent monitoring of requests associated with the application; and
based on the collected diagnostic parameters, causing, by the agent, a model of the distributed business transaction flow to be generated, wherein a flow map of the one or more distributed business transactions is generated that includes a map of applications or virtual machines that make up the one or more distributed business transactions associated with the diagnostic session triggered by the anomaly.
2. The method of
3. The method of
receiving business transaction name information; and
receiving call chain information.
4. The method of
5. The method of
6. The method of
7. The method of
8. The method of
10. The non-transitory computer readable storage medium of
11. The non-transitory computer readable storage medium of
receiving business transaction name information; and
receiving call chain information.
12. The non-transitory computer readable storage medium of
13. The non-transitory computer readable storage medium of
14. The non-transitory computer readable storage medium of
15. The non-transitory computer readable storage medium of
16. The non-transitory computer readable storage medium of
18. The system of
19. The system of
20. The system of
|
This application is a continuation and claims the priority benefit of U.S. patent application Ser. No. 14/690,254, titled “Conducting a Diagnostic Session for Monitored Business Transactions,” filed Apr. 17, 2015, which is a continuation and claims the priority benefit of U.S. patent application Ser. No. 14/071,525, now U.S. Pat. No. 9,015,317, titled “Conducting a Diagnostic Session for Monitored Business Transactions,” filed Nov. 4, 2013, which is a continuation and claims the priority benefit of U.S. patent application Ser. No. 13/189,360, now U.S. Pat. No. 8,938,533, titled “Automatic Capture of Diagnostic Data Based on Transaction Behavior Learning,” filed Jul. 22, 2011, which is a continuation-in-part and claims the priority benefit of U.S. patent application Ser. No. 12/878,919, now U.S. Pat. No. 9,167,028, titled “Monitoring Distributed Web Application Transactions,” filed Sep. 9, 2010, which claims the priority benefit of U.S. provisional patent application 61/241,256, titled “Automated Monitoring of Business Transactions,” filed Sep. 10, 2009, the disclosures of which are incorporated herein by reference.
The World Wide Web has expanded to provide web services faster to consumers. Web services may be provided by a web application which uses one or more services to handle a transaction. The applications may be distributed over several machines, making the topology of the machines that provides the service more difficult to track and monitor.
Monitoring a web application helps to provide insight regarding bottle necks in communication, communication failures and other information regarding performance of the services the provide the web application. When a web application is distributed over several machines, tracking the performance of the web service can become impractical with large amounts of data collected from each machine.
When a distributed web application is not operating as expected, additional information regarding application performance can be used to evaluate the health of the application. Collecting the additional information can consume large amounts of resources and often requires significant time to determine how to collect the information.
There is a need in the art for web service monitoring which may accurately and efficiently monitor the performance of distributed applications which provide a web service.
The present technology monitors a distributed network application system and may detect an anomaly based the learned behavior of the system. The behavior may be learned for each of one or more machines which implement a distributed business transaction. The present system may automatically collect diagnostic data for one or more business transactions and/or requests based on learned behavior for the business transaction or request. The diagnostic data may include detailed data for the operation of the distributed web application and be processed to identify performance issues for a transaction. Detailed data for a distributed web application transaction may be collected by sampling one or more threads assigned to handle portions of the distributed business transaction. Data regarding the distributed transaction may then be reported from agents monitoring portions of the distributed transaction to one or more central controllers and assembled by one or more controllers into business transactions. Data associated with one or more anomalies may be reported via one or more user interfaces.
Collection of diagnostic data at a server may be initiated locally by an agent or remotely from a controller. An agent may initiate collection of diagnostic data based on a monitored individual request or a history of monitored requests associated with a business transaction. For example, an agent at an application or Java Virtual Machine (JVM) may trigger the collection of diagnostic runtime data for a particular request if the request is characterized as an outlier. The agent may also trigger a diagnostic session for a business transaction or other category of request if the performance of requests associated with the business transaction varies from a learned baseline performance for the business transaction. The agent may determine baselines for request performance and compare the runtime data to the baselines to identify the anomaly. A controller may receive aggregated runtime data reported by the agents, process the runtime data, and determine an anomaly based on the processed runtime data that doesn't satisfy one or more parameters, thresholds or baselines.
In an embodiment, a method for performing a diagnostic session for a request may begin with initiating collection of diagnostic data associated with a request. An application thread on each of two or more servers may be sampled. The application threads may be associated with the same business transaction and the business transaction may be associated with the request. The diagnostic data may be stored.
The present technology monitors a network or web application provided by one or more distributed applications. The web application may be provided by one or more web services each implemented as a virtual machine or one or more applications implemented on a virtual machine. Agents may be installed on one or more servers at an application level, virtual machine level, or other level. An agent may monitor a corresponding application (or virtual machine) and application communications. Each agent may communicate with a controller and provide monitoring data to the controller. The controller may process the data to learn and evaluate the performance of the application or virtual machine, model the flow of the application, and determine information regarding the distributed web application performance. The monitoring technology determines how each distributed web application portion is operating, establishes a baseline for operation, and determines the architecture of the distributed system.
The present technology may monitor a distributed web application that performs one or more business transactions. A business transaction may be a set of tasks performed by one or more distributed web applications in the course of a service provide over a network. In an e-commerce service, a business transaction may be “add to cart” or “check-out” transactions performed by the distributed application.
The behavior of a system which implements a distributed web transaction may be learned for each of one or more machines which implement the distributed transaction. The behavior may be learned for a business transaction which includes multiple requests and a particular request. The present system may automatically collect diagnostic data for one or more business transactions and/or requests based on learned behavior of the business transaction or request. The diagnostic data may include detailed data for the operation of the distributed web application and be processed to identify performance issues for a transaction. Detailed data for a distributed web application transaction may be collected by sampling one or more threads assigned to handle portions of the distributed business transaction. Data regarding the distributed transaction may then be reported from agents monitoring portions of the distributed transaction to one or more central controllers and assembled by one or more controllers into business transactions. Data associated with one or more anomalies may be reported via one or more user interfaces.
The present technology may perform a diagnostic session for an anomaly detected in the performance of a portion of a distributed web application, such as a business transaction or category of request. During the diagnostic session, detailed data may be collected for the operation of the distributed web application. The data may be processed to identify performance issues for a transaction. Detailed data for a distributed web application transaction may be collected by sampling one or more threads assigned to handle portions of the distributed business transaction. Data regarding the distributed transaction may be reported from one or more agents at an application or Java Virtual Machine (JVM) to one or more controllers. The data may be received and assembled by the one or more controllers into business transactions.
The monitoring system may monitor distributed web applications across a variety of infrastructures. The system is easy to deploy and provides end-to-end business transaction visibility. The monitoring system may identify performance issues quickly and has a dynamical scaling capability across a monitored system. The present monitoring technology has a low footprint and may be used with cloud systems, virtual systems and physical infrastructures.
Agents may communicate with code within virtual machine or an application. The code may detect when an application entry point is called and when an application exit point is called. An application entry point may include a call received by the application. An application exit point may include a call made by the application to another application, virtual machine, server, or some other entity. The code within the application may insert information into an outgoing call or request (exit point) and detect information contained in a received call or request (entry point). By monitoring incoming and outgoing calls and requests, and by monitoring the performance of a local application that processes the incoming and outgoing request, the present technology may determine the performance and structure of complicated and distributed business transactions.
Client device 105 may include network browser 110 and be implemented as a computing device, such as for example a laptop, desktop, workstation, or some other computing device. Network browser 110 may be a client application for viewing content provided by an application server, such as application server 130 via network server 125 over network 120. Mobile device 115 is connected to network 120 and may be implemented as a portable device suitable for receiving content over a network, such as for example a mobile phone, smart phone, or other portable device. Both client device 105 and mobile device 115 may include hardware and/or software configured to access a web service provided by network server 125.
Network 120 may facilitate communication of data between different servers, devices and machines. The network may be implemented as a private network, public network, intranet, the Internet, or a combination of these networks.
Network server 125 is connected to network 120 and may receive and process requests received over network 120. Network server 125 may be implemented as one or more servers implementing a network service. When network 120 is the Internet, network server 125 maybe implemented as a web server.
Application server 130 communicates with network server 125, application servers 140 and 150, controller 190. Application server 130 may also communicate with other machines and devices (not illustrated in
Virtual machine 132 may be implemented by code running on one or more application servers. The code may implement computer programs, modules and data structures to implement a virtual machine mode for executing programs and applications. In some embodiments, more than one virtual machine 132 may execute on an application server 130. A virtual machine may be implemented as a Java Virtual Machine (JVM). Virtual machine 132 may perform all or a portion of a business transaction performed by application servers comprising system 100. A virtual machine may be considered one of several services that implement a web service.
Virtual machine 132 may be instrumented using byte code insertion, or byte code instrumentation, to modify the object code of the virtual machine. The instrumented object code may include code used to detect calls received by virtual machine 132, calls sent by virtual machine 132, and communicate with agent 134 during execution of an application on virtual machine 132. Alternatively, other code may be byte code instrumented, such as code comprising an application which executes within virtual machine 132 or an application which may be executed on application server 130 and outside virtual machine 132.
Agent 134 on application server 130 may be installed on application server 130 by instrumentation of object code, downloading the application to the server, or in some other manner. Agent 134 may be executed to monitor application server 130, monitor virtual machine 132, and communicate with byte instrumented code on application server 130, virtual machine 132 or another application on application server 130. Agent 134 may detect operations such as receiving calls and sending requests by application server 130 and virtual machine 132. Agent 134 may receive data from instrumented code of the virtual machine 132, process the data and transmit the data to controller 190. Agent 134 may perform other operations related to monitoring virtual machine 132 and application server 130 as discussed herein. For example, agent 134 may identify other applications, share business transaction data, aggregate detected runtime data, and other operations.
Each of application servers 140, 150 and 160 may include an application and an agent. Each application may run on the corresponding application server or a virtual machine. Each of virtual machines 142, 152 and 162 on application servers 140-160 may operate similarly to virtual machine 132 and host one or more applications which perform at lease a portion of a distributed business transaction. Agents 144, 154 and 164 may monitor the virtual machines 142-162, collect and process data at runtime of the virtual machines, and communicate with controller 190. The virtual machines 132, 142, 152 and 162 may communicate with each other as part of performing a distributed transaction. In particular each virtual machine may call any application or method of another virtual machine.
Controller 190 may control and manage monitoring of business transactions distributed over application servers 130-160. Controller 190 may receive runtime data from each of agents 134-164, associate portions of business transaction data, communicate with agents to configure collection of runtime data, and provide performance data and reporting through an interface. The interface may be viewed as a web-based interface viewable by mobile device 115, client device 105, or some other device. In some embodiments, a client device 192 may directly communicate with controller 190 to view an interface for monitoring data.
Asynchronous network machine 170 may engage in asynchronous communications with one or more application servers, such as application server 150 and 160. For example, application server 150 may transmit several calls or messages to an asynchronous network machine. Rather than communicate back to application server 150, the asynchronous network machine may process the messages and eventually provide a response, such as a processed message, to application server 160. Because there is no return message from the asynchronous network machine to application server 150, the communications between them are asynchronous.
Data stores 180 and 185 may each be accessed by application servers such as application server 150. Data store 185 may also be accessed by application server 150. Each of data stores 180 and 185 may store data, process data, and return queries received from an application server. Each of data stores 180 and 185 may or may not include an agent.
Application server 200 and application 220 can be instrumented via byte code instrumentation at exit and entry points. An entry point may be a method or module that accepts a call to application 220, virtual machine 210, or application server 200. An exit point is a module or program that makes a call to another application or application server. As illustrated in
Agent 230 may be one or more programs that receive information from an entry point or exit point. Agent 230 may process the received information, may retrieve, modify and remove information associated with a thread, may access, retrieve and modify information for a sent or received call, and may communicate with a controller 190. Agent 230 may be implemented outside virtual machine 210, within virtual machine 210, and within application 220, or a combination of these.
Diagnostic parameters may be configured for one or more agents at step 310. The diagnostic parameters may be used to implement a diagnostic session conducted for a distributed web application business transaction. The parameters may be set by a user, an administrator, may be pre-set, or may be permanently configured.
Examples of diagnostic parameters that may be configured include the number of transactions to simultaneously track using diagnostic sessions, the number of transactions tracked per time period (e.g., transactions tracked per minute), the time of a diagnostic session, a sampling rate for a thread, a threshold percent of requests detected to run slow before triggering an anomaly, outlier information, and other data. The number of transactions to simultaneously track using diagnostic sessions may indicate the number of diagnostic sessions that may be ongoing at any one time. For example, a parameter may indicate that only 10 different diagnostic sessions can be performed at any one time. The time of a diagnostic session may indicate the time for which a diagnostic session will collect detailed data for operation of a transaction, such as for example, five minutes. The sampling rate of a thread may be automatically set to a sampling rate to collect data from a thread call stack based on a detected change in value of the thread, may be manually configured, or otherwise set. The threshold percent of requests detected to run slow before triggering an anomaly may indicate a number of requests to be detected that run at less than a baseline threshold before triggering a diagnostic session. Diagnostic parameters may be set at either a controller level or an individual agent level, and may affect diagnostic tracking operation at both a controller and/or an agent.
Requests may be monitored and runtime data may be collected at step 320. As requests are received by an application and/or JVM, the requests are associated with a business transaction by an agent residing on the application or JVM, and may be assigned a thread within a thread pool by the application or JVM itself. The business transaction is associated with the thread by adding business transaction information, such as a business transaction identifier, to the thread by an agent associated with the application or JVM that receives the request. The thread may be configured with additional monitoring parameter information associated with a business transaction. Monitoring information may be passed on to subsequent called applications and JVMs that perform portions of the distributed transaction as the request is monitored by the present technology.
Diagnostic data is collected by an agent at step 330. Diagnostic data may be collected for one or more transactions or requests. Diagnostic data may be collected based on the occurrence of an outlier or an anomaly. Collecting diagnostic data is discussed in more detail below with respect to
A determination is made as to whether instructions have been received from a controller to collect diagnostic data at step 340. A diagnostic session may be triggered “centrally” by a controller based on runtime data received by the controller from one or more agents located throughout a distributed system being monitored. If a controller determines that an anomaly is associated with a business transaction, or portion of a business transaction for which data has been reported to the controller, the controller may trigger a diagnostic session and instruct one or more agents residing on applications or JVMs that handle the business transaction to conduct a diagnostic session for the distributed business transaction. Operation of a controller is discussed in more detail below with respect to the method of
If no instructions are received from a controller to collect diagnostic data, the method of
If the request is locally identified locally as an outlier at step 370, a diagnostic data (i.e., detailed data regarding the request) associated with the particular request associated with the outlier is collected at step 375. Diagnostic data may be collected by sampling a thread call stack for the thread that is locally handling the request associated with the outlier. The agent may collect data for the remainder of the request duration. After collecting diagnostic data, the method of
A determination is made as to whether a business transaction is locally identified as an anomaly at step 380. A business transaction may be locally identified as an anomaly by an agent that resides on an application or JVM and processes runtime data associated with the business transaction. The agent may identify the anomaly based on aggregated abnormal behavior for the business transaction, such as an increase in the rate of outliers for the business transaction. For example, if the business transaction has a higher rate of outliers in the last ten minutes than a learned baseline of outliers for the previous hour for the business transaction, the agent may identify the corresponding business transaction performance as an anomaly and trigger a diagnostic session to monitor the business transaction. Identifying a business transaction as an anomaly is discussed in more detail below with respect to the method of
If the business transaction is identified locally as an anomaly at step 380, a diagnostic session is triggered and diagnostic data associated with the anomalous business transaction is collected at step 385. Diagnostic data may be collected by sampling a thread call stack for the thread that is locally handling one or more requests that form the business transaction that triggered the diagnostic session. The agent may collect data for future occurrences of the business transaction. Outgoing calls associated with the monitored transaction may be monitored to initiate called applications to perform collect diagnostic data as part of the diagnostic session for the transaction. Collecting diagnostic data associated with an anomaly is discussed in more detail below with respect to
A performance baseline may be determined automatically and continuously by an agent. The moving average may be associated with a particular window, such as one minute, ten minutes, or an hour, the time of day, day of the week, or other information to provide a context which more accurately describes the typical performance of the system being monitored. For example, baselines may be determined and updated for transactions occurring within a specific time range within a day, such as 11:00 AM to 2:00 PM. The baseline may be, for example, a moving average of the time to perform a request, the number of outliers occurring, or other data collected during the particular baseline window. For purposes of discussion, a baseline is discussed with respect to a rate of outliers occurring for a business transaction within a time window at a particular machine.
In some embodiments, a standard deviation may be automatically determined by the agent, controller, or other source and used to identify an anomaly. For example, a baseline may be determined from an average response time of one second for a particular transaction. The standard deviation may be 0.3 seconds. As such, a response time of 1.0-1.3 seconds may be an acceptable time for the business transaction to occur. A response time of 1.3-1.6 seconds may be categorized as “slow” for the particular request, and a response time of 1.6-1.9 seconds may be categorized as very slow and may be identified as an anomaly for the request. An anomaly may also be based on a number requests having a response time within a particular derivative range. For example, an anomaly may be triggered if 15% or more of requests have performed “slow”, or if three or more instances of a request have performed “very slow.”
The runtime data collected for current outliers is compared to the business transaction performance baseline at step 420 by the particular agent. For example, the number of outliers occurring for a business transaction in the time window is compared to the baseline of outlier occurrence for the business transaction.
An anomaly may be identified by the agent based on the comparison at step 430. For example, if an agent detects that the number of outliers that occurred for a business transaction within a the past ten minutes is greater than the baseline outlier rate for the business transaction, the agent may identify an anomaly.
A thread call stack may be sampled, stored and processed at step 520. The thread assigned to handle a request may be sampled to determine what the thread is presently handling for the request. The thread call stack data received from the sampling may be stored for later processing for the particular distributed web transaction. Sampling and storing a thread call stack is discussed in more detail below with respect to the method at
An outgoing application call may be modified with diagnostic tracking information at step 530. When a call to an outside application is detected, the call may be modified with diagnostic information for the receiving application. The diagnostic information may include the diagnostic session GUID and other data. Modifying an outgoing application call with diagnostic tracking information is discussed in more detail with respect to the method at
A completed request is detected at step 540. At the completion of the request, data for the request associated with the anomaly may be stored by the agent and eventually sent to a controller. The diagnostic session may be continued for a period of time specified in a corresponding diagnostic parameter for the agent.
An initial sampling rate for the thread may be set at step 610. The initial sampling rate may be set to a default rate, for example a rate of every 10 milliseconds.
The current thread call stack is accessed at the set thread sampling rate at step 615. Sampling the thread call stack may detect what the thread is currently doing. For example, sampling the thread call stack may reveal that the thread is currently processing a request, processing a call to another application, executing an EJB, or performing some other process. The thread call stack may be sampled and the sampled data may be stored locally by the agent sampling the stack.
After sampling of the thread call stack, the agent may determine whether the thread call stack data retrieved as a result of the sampling has changed at step 620. The change is determined by the agent by comparing the most recent call stack data to the previous call stack data. A thread snapshot is updated at step 640 based on the most recent sampling. The snapshot indicates what the thread call stack has performed. An example of a call stack is discussed below with respect to the interface of
A thread snapshot is updated at step 625. The thread snapshot is updated to indicate changes to the thread call stack. A determination is made at step 630 to determine if an event has been detected at step 630. The event may be the expiration of a period of time (for example, based on thread sampling rate), the detection of a new request made by a thread, or some other event. If an event is detected, the thread call stack is sampled at step 635 and the method of
A determination is made at step 640 as to whether the thread has completed at step 640. If the thread is complete, the method of
At a time of 34 ms, a call to D may be detected. As a result, the thread call stack may be sampled as a result of detecting the call at a time of 34 ms. Hence, a thread call stack may be sampled in response to detecting a call in addition to periodically.
At a time of 40 ms in
At a time of 60 ms, application D has completed and application B has again called application C. An agent processing the thread call stack data may determine that application D executed for 20 ms, and application B called C a second time. The second call to application C may be represent a sequence of calls to application C (one at 20 ms sampling, and one at 60 ms sampling). The present technology may differentiate between each call to application C as part of the request. At 70 ms in time, application C has completed, corresponding to an execution of 10 milliseconds for the second call to application C. At a time of 80 ms, B has completed, corresponding to an execution time of 70 milliseconds for application B.
First, an application call is detected at step 710. The application call may be detected by sampling a thread call stack associated with the thread handling a request being monitored.
The application call recipient may be added to a call chain at step 720. Once the call is detected at step 710, information regarding the call can be accessed from the thread call stack, including the recipient of the detected call. The call recipient may be added to a call chain maintained in the thread being monitored. The call chain may include call sequence information if more than one call is made to a particular application as part of processing a request locally.
The call chain attribute and call sequence attribute may be added to the call header at step 730. A diagnostic session GUID may be added to the call header at step 740. An application receives the call with a diagnostic session GUID, and an agent at the receiving application detects the diagnostic session GUID. The agent on the receiving application may then monitor the thread processing the received call, associated collected data with the particular diagnostic session GUID, and report the data to a controller. The application call may then be sent with the modified call header to an application at step 750.
A request is received by the application at step 810. An agent may detect a request GUID in the request header at step 820. The request GUID may indicate an identifier for a diagnostic session currently underway for a distributed transaction that includes the particular request. The received request may be performed and monitored at step 830. Runtime data, including diagnostic data, may be collected throughout processing of the request at step 840. The request's completion is detected at step 850, and a response to the received request is generated and transmitted to the requesting application at step 860. Eventually, collected runtime data including diagnostic data and other data associated with the request may be reported to a controller at step 870.
A call chain may be constructed for each business transaction at step 920. The call chain is constructed from the aggregated runtime data. For example, transactions may be pieced together based on request GUIDs and other data to build a call chain for each business transaction. Received diagnostic data for locally identified anomalies may be processed by the controller at step 930. Processing the diagnostic data may include determining the response times for portions of a distributed business transaction as well as the transaction as a whole, identifying locally detected anomalies, and other processing. Baseline performance for a business transaction call chain is determined at step 940. The baseline performance may be determined based on past performance for each business transaction and portions thereof, including for example each request that is made as part of a business transaction.
Selected agents associated with the applications and JVMs that perform the transaction associated with the anomaly are instructed to collect diagnostic data based on diagnostic parameters at step 950. The diagnostic data may be collected as part of a diagnostic session already triggered by an agent (locally determined anomaly) or triggered by the controller. In some embodiments, the controller may determine whether the maximum number of diagnostic sessions is already reached, and if so may place the presently detected diagnostic session in a queue for execution as soon as a diagnostic session is available.
Diagnostic data is received from selected agents collecting data as part of the diagnostic session at step 960. Performance data is generated from the collected diagnostic data received from one or more agents, and the performance data may be reported by the controller at step 970. The performance data may be reported via one or more interfaces, for example through an interface discussed in more detail with respect to
A determination is made as to whether selected agents are identified to perform a diagnostic session per performance sampling at step 985. If no agents are identified, the method ends. If one or more agents are selected, the selected agents are instructed to collect diagnostic data based on the diagnostic parameters.
During a diagnostic session, deep diagnostic data may be retrieved for one or more distributed business transactions associated with a diagnostic session which are performed by one or more applications or JVMs.
The transaction flow map 1010 includes an e-commerce service application, an inventory service application, an inbound inventory database, another inventory database, an order processing service application, and an orders database. The time spent at each application or database by the request is indicated in the flow map, as well as a percentage of the overall time the request spent at that application. Other information such as the type of request received between two applications is also shown to illustrate the relationships between the applications which perform the distributed application.
Load information frame 1020 indicates the load result for the particular request in a format of calls received per minute. The average response time frame indicates the average response time for the request over time. The incident description frame 1020 indicates a description of the incident associated with the anomaly. The request summary indicates the number of requests which fall into different categories, such as normal, slow, very slow, errors, and stalls. Other information, including recent request snapshots with call graphs and recent errors, may also be illustrated within a transaction flow map interface 1000.
The components shown in
Mass storage device 1330, which may be implemented with a magnetic disk drive or an optical disk drive, is a non-volatile storage device for storing data and instructions for use by processor unit 1310. Mass storage device 1330 can store the system software for implementing embodiments of the present invention for purposes of loading that software into main memory 1310.
Portable storage device 1340 operates in conjunction with a portable non-volatile storage medium, such as a floppy disk, compact disk or Digital video disc, to input and output data and code to and from the computer system 1300 of
Input devices 1360 provide a portion of a user interface. Input devices 1360 may include an alpha-numeric keypad, such as a keyboard, for inputting alpha-numeric and other information, or a pointing device, such as a mouse, a trackball, stylus, or cursor direction keys. Additionally, the system 1300 as shown in
Display system 1370 may include a liquid crystal display (LCD) or other suitable display device. Display system 1370 receives textual and graphical information, and processes the information for output to the display device.
Peripherals 1380 may include any type of computer support device to add additional functionality to the computer system. For example, peripheral device(s) 1380 may include a modem or a router.
The components contained in the computer system 1300 of
The foregoing detailed description of the technology herein has been presented for purposes of illustration and description. It is not intended to be exhaustive or to limit the technology to the precise form disclosed. Many modifications and variations are possible in light of the above teaching. The described embodiments were chosen in order to best explain the principles of the technology and its practical application to thereby enable others skilled in the art to best utilize the technology in various embodiments and with various modifications as are suited to the particular use contemplated. It is intended that the scope of the technology be defined by the claims appended hereto.
Sunkara, Bhaskar, Bansal, Jyoti
Patent | Priority | Assignee | Title |
Patent | Priority | Assignee | Title |
6378070, | Jan 09 1998 | HEWLETT-PACKARD DEVELOPMENT COMPANY, L P | Secure printing |
6529932, | Apr 01 1998 | Microsoft Technology Licensing, LLC | Method and system for distributed transaction processing with asynchronous message delivery |
6546548, | Dec 12 1997 | International Business Machines Corporation | Method and system for compensating for output overhead in trace data using initial calibration information |
6553564, | Dec 12 1997 | International Business Machines Corporation | Process and system for merging trace data for primarily interpreted methods |
6601192, | Aug 31 1999 | Accenture Global Services Limited | Assertion component in environment services patterns |
6651243, | Dec 12 1997 | International Business Machines Corporation; INTERNATION BUSINESS MACHINES CORP | Method and system for periodic trace sampling for real-time generation of segments of call stack trees |
6721941, | Aug 27 1996 | JPMORGAN CHASE BANK, N A , AS SUCCESSOR AGENT | Collection of timing and coverage data through a debugging interface |
6990521, | Aug 11 1998 | Computer Associates Think, Inc. | Transaction recognition and prediction using regular expressions |
7328213, | Jul 01 2003 | Fujitsu Limited | Transaction processing method, transaction control apparatus and program thereof |
7389514, | Oct 28 1997 | Microsoft Technology Licensing, LLC | Software component execution management using context objects for tracking externally-defined intrinsic properties of executing software components within an execution environment |
7499951, | Nov 18 2005 | Oracle International Corporation | Capturing data from user selected portions of a business process and transferring captured data to user identified destinations |
7523067, | Aug 02 2000 | Kioba Processing, LLC | Electronic settlement system, settlement apparatus, and terminal |
7577105, | Dec 11 2003 | Fujitsu Limited | Cooperation information managing apparatus and gateway apparatus for use in cooperation information managing system |
7606814, | Mar 11 2004 | Microsoft Technology Licensing, LLC | Tracing a web request through a web server |
7721268, | Oct 01 2004 | Microsoft Technology Licensing, LLC | Method and system for a call stack capture |
7730489, | Dec 10 2003 | Oracle America, Inc | Horizontally scalable and reliable distributed transaction management in a clustered application server environment |
7844033, | Jun 03 2005 | TEKELEC GLOBAL, INC | Methods, systems, and computer program products for generic call tracing |
7979569, | Dec 01 2005 | FIRESTAR SOFTWARE, INC | System and method for exchanging information among exchange applications |
8099631, | Jul 17 2009 | SAP SE | Call-stacks representation for easier analysis of thread dump |
8205035, | Jun 22 2009 | Citrix Systems, Inc | Systems and methods for integration between application firewall and caching |
8438427, | Apr 08 2011 | CA, INC | Visualizing relationships between a transaction trace graph and a map of logical subsystems |
8560449, | Jul 30 2009 | RED GIANT INC | Adaptive transaction rules system |
8606692, | Nov 08 2010 | Bank of America Corporation | Processing loan transactions |
8843684, | Jun 11 2010 | International Business Machines Corporation | Performing call stack sampling by setting affinity of target thread to a current process to prevent target thread migration |
20030191989, | |||
20040133882, | |||
20040193552, | |||
20040193612, | |||
20040215768, | |||
20070038896, | |||
20070150568, | |||
20080066068, | |||
20080148240, | |||
20080172403, | |||
20080235365, | |||
20080307441, | |||
20090006116, | |||
20090049429, | |||
20090187791, | |||
20090193443, | |||
20090216874, | |||
20090241095, | |||
20090271511, | |||
20090300405, | |||
20100017583, | |||
20100183007, | |||
20110016328, | |||
20110098973, | |||
20120297371, | |||
20150319265, |
Executed on | Assignor | Assignee | Conveyance | Frame | Reel | Doc |
Oct 24 2015 | BANSAL, JYOTI | APPDYNAMICS, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 041685 | /0422 | |
Oct 24 2015 | SUNKARA, BHASKAR | APPDYNAMICS, INC | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 041685 | /0422 | |
Oct 31 2015 | Cisco Technology, Inc. | (assignment on the face of the patent) | / | |||
Jun 16 2017 | APPDYNAMICS, INC | AppDynamics LLC | CHANGE OF NAME SEE DOCUMENT FOR DETAILS | 042964 | /0229 | |
Oct 05 2017 | AppDynamics LLC | Cisco Technology, Inc | ASSIGNMENT OF ASSIGNORS INTEREST SEE DOCUMENT FOR DETAILS | 044173 | /0050 |
Date | Maintenance Fee Events |
Jun 07 2018 | PTGR: Petition Related to Maintenance Fees Granted. |
Sep 11 2022 | M1551: Payment of Maintenance Fee, 4th Year, Large Entity. |
Date | Maintenance Schedule |
Mar 12 2022 | 4 years fee payment window open |
Sep 12 2022 | 6 months grace period start (w surcharge) |
Mar 12 2023 | patent expiry (for year 4) |
Mar 12 2025 | 2 years to revive unintentionally abandoned end. (for year 4) |
Mar 12 2026 | 8 years fee payment window open |
Sep 12 2026 | 6 months grace period start (w surcharge) |
Mar 12 2027 | patent expiry (for year 8) |
Mar 12 2029 | 2 years to revive unintentionally abandoned end. (for year 8) |
Mar 12 2030 | 12 years fee payment window open |
Sep 12 2030 | 6 months grace period start (w surcharge) |
Mar 12 2031 | patent expiry (for year 12) |
Mar 12 2033 | 2 years to revive unintentionally abandoned end. (for year 12) |