Skip to content
dubek edited this page Apr 23, 2013 · 3 revisions

Monitoring

Dempsy monitoring is very flexible, detailed, powerful, and designed to be used in a widely distributed system. Out of the box it supports integration with several different "reporting back ends" including:

  1. Full integration with Gaphite
  2. Full integration with Ganglia
  3. It can write all metrics to a set of CSV files
  4. It can write all metrics to the console

... or any combination of the above. With very little effort you can merge the Dempsy framework metrics with application metrics and piggyback your own application monitoring points onto the same system.

Or, like many of the other core abstractions that Dempsy is built on, you can completely replace the implementation and develop your own monitoring extensions. For more information on this see the section on Framework Development and specifically the section on the Monitoring Api.

Dempsy's default monitoring is built on the Yammer Metrics library by Coda Hale, an incredibly simple, flexible and versatile monitoring abstraction.

Framework Monitoring Metrics

The following lists all of the framework metrics that are gathered. Depending on the back-end selected, aggregations of individual metrics can also be supplied. For example, when using Graphite you can see, not only the raw message count, but also 1-minute, 5-minute, and 10-minute average message rates, among others. Graphite will also provide very useful duration/timing aggregates including medians, standard-deviations, and various percentile times (99th, 99.9th, etc) which can be really useful in tracking down application problems. These aggregations will not be listed in the following table, but can be inferred depending on the aggregation capabilities of the reporting back end.

Keep in mind, metrics are specific to a particular cluster within a particular node. Aggregation is provided by the "reporting back end" functionality.

The metrics are broken into three sets. It's useful to keep this distinction in mind:

  1. Receive side metrics are those that track information about message receiving. They deal with monitoring what happens from when a message comes into a node to when it's routed to a message processor.
  2. Processing metrics are those that track information about the processing of messages, including timing, counts and failures.
  3. Send side metrics are those that track information about message sending. They deal with monitoring what happens from when a message is dispatched from an Adaptor or returned from a message processor to when it's sent on to the next step in the processing.

Receive side metrics

Metric Description
messages-received How many messages were successfully received by this node. Note that in Adaptors this will always be zero.
bytes-received How many bytes were successfully received by this node. Note that in Adaptors this will always be zero.
messages-discarded This metric indicates how many messages were discarded. This metric does not incorporate 'failed' messages (see the entry below) but does include 'collisions' (see the entry below). Dempsy usually discards messages due to load when queues fill up. It will always opt to discard the oldest messages. This metric will always be zero in an Adaptor.
messages-collisions Depending on the transport and how it's configured, Dempsy may discard messages if one shows up for a message processor while it's busy working on something else. Transport defaults will queue messages when collisions happen rather than discard them so this metric will always be zero unless the transport configuration is told to 'fail fast'. See the details on the various Transports for more information. This metric is also incorporated into the 'messages-discarded' described above. This metric will always be zero in an Adaptor.
messages-pending The total number of queued messages waiting to be processed at any given point in time. This metric is dependent on the underlying transport and isn't supported in all cases. If the transport queues incoming messages, then it should be supplied. This metric will always be zero in an Adaptor.

Processing Metrics

Metric Description
messages-dispatched This is the total number of messages that have been dispatched to a message processor no matter what the eventual disposition of the processing is. This metric will always be zero in an Adaptor.
messages-processed This metric indicates how many messages were successfully processed by a _message processor_. These include all messages provided to a message processor where themessage processor's @MessageHandler returned without throwing an exception, whether or not it returned a message to be forwarded on. This metric will always be zero in an Adaptor.
messages-mp-failed This metric includes all messages provided to a message processor where the message processor's @MessageHandler threw and exception. This metric will always be zero in an Adaptor.
messages-dempsy-failed This metric includes all message that failed due to a framework/container error when a message was presented to a message processor. These are extremely rare and will be accompanied by an error in the log detailing the problem. This metric will always be zero in an Adaptor.
message-processors-created The total number of message processors that have been created. This is not the total number of message processor currently in the container but the number that have been created since the container was started. Therefore the number is always increasing even when eviction is enabled or elasticity moves instances. The current total number of live message processor will be 'message-processors-created' - 'message-processors-deleted' metrics (see below for 'message-processors-deleted'). This metric will always be zero in an Adaptor.
message-processors-deleted The total number of message processors that have been @Passivated. There are several reasons that a message processor might be passivated. Please review the section on Passivation for more information. This metric will always be zero in an Adaptor.
message-processors A snapshot of the total number of message processors currently active in the container. This metric will always be zero in an Adaptor.
mp-handle-message-duration The duration of calls on the message processor's @MessageHandler method. Aggregates of this value can be useful in determining application performance bottlenecks. This metric will always be zero in an Adaptor.
outputInvoke-duration The duration of @Output passes. See the section on Non-stream Driven Message Processor Output. Aggregates of this value can be useful in determining application performance bottlenecks in the output processing of the application. This metric will always be zero when the message processor has no @Output method or in an Adaptor.
evictionInvoke-duration The duration of Eviction check passes. See the section on Eviction. This metric will always be zero when the message processor has no @Evictable method or in an Adaptor.
pre-instantiation-duration The duration of the pre-instantiation pass. This metric will always be zero in an Adaptor.

Send side metrics

Metric Description
messages-sent How many messages were successfully sent out of this node.
bytes-sent How many bytes were successfully sent out of this node.
messages-unsent This metric indicates how many messages where an attempt was made to send, but were discarded prior to successfully transmitting the message. This count is not incorporated in the above 'messages-sent' metric since that only covers successfully transmitted messages. There are several reasons this metric can be triggered. They include: 1) There is no current destination available for the message. 2) The message has no @MessageKey or reflection failed to retrieve it. 3) A shutdown happens while the message is being queued. 4) An exception (likely an I/O exception) happens when attempting to transmit the message. 4) There are too many messages queued up to be sent out to a particular destination. In this case Dempsy will always opt to discard the oldest messages.
messages-out-pending The total number of queued messages waiting to be transmitted at any given point in time. This metric is dependent on the underlying transport and isn't supported in all cases. If the transport queues outgoing messages, then it should be supplied.

Monitoring Configuration

The monitoring can be tuned in several different ways but each of these require setting the statsCollectorFactory property on the Application Definition and setting parameters on the implementation provided. The property is set to an implementation of the interface com.nokia.dempsy.monitoring.StatsCollectorFactory (again, see the Monitoring Api section for more details on this interface and how to provide your own implementation). The implementation provided with Dempsy is called com.nokia.dempsy.monitoring.coda.StatsCollectorFactoryCoda (named after Coda Hale).

As an example, one that will be the basis of the following sections, you would add the following to your Application Definition for your application:

<beans>
  <bean class="com.nokia.dempsy.config.ApplicationDefinition">
    <constructor-arg value="myApplication" />
    ...        
    <property name="statsCollectorFactory">
        <bean class="com.nokia.dempsy.monitoring.coda.StatsCollectorFactoryCoda" />
    </property>
    ...

Of course, without any properties set on the StatsCollectorFactoryCoda there's nothing more than the default already gives you. The properties that can be set are described in the following sections:

Enabling various monitoring reporting back-ends

By default, Dempsy's monitoring implementation isn't configured with any "reporting back-ends" (see the enumerated list above). However, all metrics are automatically exposed via JMX even if no (other) reporting back ends are configured. If you want to use one of the reporting back-ends you need to supply a list of reporting specs (com.nokia.dempsy.monitoring.coda.MetricsReporterSpec instances) to the configuration of the StatsCollectorFactoryCoda implementation.

To select different reporting back ends you use the type property on the MetricsReporterSpec. The following shows how to select Graphite:

<beans>
  <bean class="com.nokia.dempsy.config.ApplicationDefinition">
    <constructor-arg value="myApplication" />
    ...        
    <property name="statsCollectorFactory">
      <bean class="com.nokia.dempsy.monitoring.coda.StatsCollectorFactoryCoda" >
        <property name="reporters">
          <list>
            <bean class="com.nokia.dempsy.monitoring.coda.MetricsReporterSpec">
              <property name="type">
                <value type="com.nokia.dempsy.monitoring.coda.MetricsReporterType">GRAPHITE</value>
              </property>
    ...

MetricsReporterType is an enum with the values: GRAPHITE,GANGLIA,CONSOLE,CSV.

Note: You can enable multiple reporting back ends simultaneously by providing multiple MetricsReporterSpecs in the list for the "reporters" property.
Note: Keep in mind, all metrics are automatically reported through JMX not matter which, or how many, "reporting back-ends" are enabled.
Configuring Graphite

Here is an example of how you configure Graphite as the reporting back end for the monitoring:

<beans>
  <bean class="com.nokia.dempsy.config.ApplicationDefinition">
    <constructor-arg value="myApplication" />
    ...        
    <property name="statsCollectorFactory">
      <bean class="com.nokia.dempsy.monitoring.coda.StatsCollectorFactoryCoda" >
        <property name="reporters">
          <list>
            <bean class="com.nokia.dempsy.monitoring.coda.MetricsReporterSpec">
              <property name="type">
                <value type="com.nokia.dempsy.monitoring.coda.MetricsReporterType">GRAPHITE</value>
              </property>
              <property name="period" value="1"/>
              <property name="unit">
                <value type="java.util.concurrent.TimeUnit">MINUTES</value>
               </property>
               <property name="hostName" value="${graphite.host}"/>
               <property name="portNumber" value="${graphite.port}"/>
            </bean>
          </list>
        </property>
      </bean>
    </property>
    ...

When configuring the StatsCollectorFactoryCoda to report to Graphite you MUST supply:

  1. the: period and the unit, together these provide the time period for successive outputs to Graphite
  2. the hostName and portNumber for where the Graphite server is listening.

Notice that in the above example, the Spring configuration assumes the PropertyPlaceholderConfigurer or -D command line parameters are being used to supply the graphite host name and port.

Configuring Ganglia

Ganglia configuration is identical to Graphite configuration except the type is GANGLIA. For example:

<beans>
  <bean class="com.nokia.dempsy.config.ApplicationDefinition">
    <constructor-arg value="myApplication" />
    ...        
    <property name="statsCollectorFactory">
      <bean class="com.nokia.dempsy.monitoring.coda.StatsCollectorFactoryCoda" >
        <property name="reporters">
          <list>
            <bean class="com.nokia.dempsy.monitoring.coda.MetricsReporterSpec">
              <property name="type">
                <value type="com.nokia.dempsy.monitoring.coda.MetricsReporterType">GANGLIA</value>
              </property>
              <property name="period" value="1"/>
              <property name="unit">
                <value type="java.util.concurrent.TimeUnit">MINUTES</value>
               </property>
               <property name="hostName" value="${ganglia.host}"/>
               <property name="portNumber" value="${ganglia.port}"/>
            </bean>
          </list>
        </property>
      </bean>
    </property>
    ...

Again, like for Graphite, you MUST supply:

  1. the: period and the unit, together these provide the time period for successive outputs to Ganglia
  2. the hostName and portNumber for where the Ganglia server (carbon-cache agent) is listening.
Configuring CSV output

For CSV output you just need to supply the output directory. The CSV reporter will periodically append data to metric-specific files in that output directory. An example of how to configure it follows:

<beans>
  <bean class="com.nokia.dempsy.config.ApplicationDefinition">
    <constructor-arg value="myApplication" />
    ...        
    <property name="statsCollectorFactory">
      <bean class="com.nokia.dempsy.monitoring.coda.StatsCollectorFactoryCoda" >
        <property name="reporters">
          <list>
            <bean class="com.nokia.dempsy.monitoring.coda.MetricsReporterSpec">
              <property name="type">
                <value type="com.nokia.dempsy.monitoring.coda.MetricsReporterType">CSV</value>
              </property>
              <property name="period" value="1"/>
              <property name="unit">
                <value type="java.util.concurrent.TimeUnit">MINUTES</value>
               </property>
               <property name="outputDir" >
                 <bean class="java.io.File">
                    <constructor-arg value="${csv.outdir}" />
                 </bean>
               </property>
            </bean>
          </list>
        </property>
      </bean>
    </property>
    ...

You MUST supply:

  1. the: period and the unit, together these provide the time period for successive outputs to the individual CSV files.
  2. the outputDir of where to write the files.
Configuring Console output

You can configure Dempsy to periodically report the metrics to the console (stdout):

<beans>
  <bean class="com.nokia.dempsy.config.ApplicationDefinition">
    <constructor-arg value="myApplication" />
    ...        
    <property name="statsCollectorFactory">
      <bean class="com.nokia.dempsy.monitoring.coda.StatsCollectorFactoryCoda" >
        <property name="reporters">
          <list>
            <bean class="com.nokia.dempsy.monitoring.coda.MetricsReporterSpec">
              <property name="type">
                <value type="com.nokia.dempsy.monitoring.coda.MetricsReporterType">CONSOLE</value>
              </property>
              <property name="period" value="1"/>
              <property name="unit">
                <value type="java.util.concurrent.TimeUnit">MINUTES</value>
               </property>
            </bean>
          </list>
        </property>
      </bean>
    </property>
    ...

You MUST supply the: period and the unit, together these provide the time period for successive outputs to the console.

Metric Naming

Metrics are named hierarchically and various reporting back-ends (and JMX) handle these names differently. Metric naming is important for management and aggregation purposes. For example, it helps to be able to separate the 'messages-received' coming from different applications, hosts, or difference deployment environments (development, testing, etc). This is especially important where the metrics are centrally gathered, tracked and managed.

By default, Dempsy provides it's metrics using the following naming scheme:

[environment prefix][node name][application name]-[cluster name].Dempsy.[metric name]

Note that the environment prefix and the node only apply to the Graphite and Ganglia reporting back-ends and do not effect the names of the monitoring points in either the other back-ends or in JMX.

Keep in mind that some back-ends, like Graphite, give special meaning to the '.' character.

The environment prefix provides a means to prepend an indication as to which environment this node is running in. This allows for the use of a single Graphite or Ganglia server to monitoring multiple environments and yet, at the top of the hierarchy, separate all of the metrics by which environment they're from.

You supply the environment prefix in one of two ways. First, you can set the environmentPrefix property directly on the StatsCollectorFactoryCoda instance when configuring it as described above. Alternatively you can provide a system property using -Denvironment.prefix=.... The system property will take precedence.

The environment prefix doesn't assume a '.' after it by default so if you want the back-end to acknowledge the environment as a place in the hierarchy of the name, then you should include the '.' when you specify the prefix. For example, in Graphite, there's a difference between -Denvironment.prefix=test- and -Denvironment.prefix=test.. The latter will allow the collapsing of the hierarchy under the top level environment name, the former will not. This allows the person providing the environment name to determine whether or not they want it flat, or hierarchical.

The environment prefix has no effect on the name within JMX or within reporting back-ends other than Graphite and Ganglia.

The node name is derived from the transport. In most cases it will be the IP Address of the host that the node in question is running on. However, since the IP address has '.' characters in it, this would cause Graphite to create a hierarchy out of the octets of the address so the '.' characters are replaced with '-' characters.

The node name has no effect on the name within JMX or within reporting back-ends other than Graphite and Ganglia.

Next comes the [application name]-[cluster name]. This is directly from the Application Definition and Cluster Definition that's driving the node in question. The preference here is to flatten out the metrics (not use the '.'). This is a preference based on experience. If you really don't like this you can overload the naming scheme following the directions in the Monitoring Api.

In order to separate framework metrics from application metrics, a point in the hierarchy is inserted using .Dempsy. which will cause all of the framework metrics to be grouped under a Dempsy subset.

Each metric from the tables above will then be the last part of the name. Below this point in the name hierarchy will be the specific aggregates provided by Yammer Metrics and will depend on the type of metric.

Providing Application Specific Metrics

Since Dempsy is built on Yammer Metrics, then simply using Yammer Metrics directly in your application will expose those metrics through whatever back-ends are configured along with the framework metrics.

Of course, it will help to use a naming scheme within Yammer Metrics that cooperates with the existing Dempsy framework use in order to have all of your metrics organized together with the framework metrics. There is no need to handle the environment name or node name portions of the metric names as those are specific to the reporting back-end and configured through the above configuration techniques. An example follows:

import com.yammer.metrics.core.MetricName;

  ...
     MetricName appMetricName = 
          new MetricName(applicationName + "-" + clusterName,"application", "my-custom-application-metric");
  ...

If you want access to the ClusterId that you're particular message processor is running in, you can use the @Start method on the message processor prototype.

Next section: Executor Configuration

Clone this wiki locally