Harmony > SOA Technical Reference Architecture > Operational Management

Operational Management


Operational Management covers non-functional production capabilities (monitoring, provisioning, etc) found in an SOA environment. Operational Management in a service-oriented environment is primarily concerned with the following challenges:

  • Deployment, which focuses on the ability to manage a multitude of services, from a centralized console, in a consistent manner throughout the enterprise. Managing deployment includes the tasks of configuring the services, deploying the service to servers, and displaying the status of all the services on all the servers.
  • Versioning, which focuses on the ability to ensure backward compatibility, by ensuring that the older versions of consumer requests are served by the older versions of service instances. It allows rollout of newer versions to a limited user group, prior to a full-blown release, thereby reducing the overall risk of exposure to a new version.
  • A Service Level Agreement (SLA) is a collection of service-level objectives (SLOs) agreed upon by a service provider and a consumer. A SLO is a proposed acceptable range of a single verifiable measurement – such as request processing time – that's important to the consumer. For example, an SLO might state that request processing time not exceed 30 milliseconds for requests with less than 100 data elements.
  • Root Cause Analysis which is the ability to diagnose and correct problems. This is one of the primary goals of an SOA management system. Monitoring determines that there is or soon will be a problem. Beyond that, the management system should offer tools to narrow down the cause of the problem.
  • Virtualization is an umbrella category of a set of capabilities that are primarily concerned with insulating service consumers from change, and with providing service providers with implementation and deployment flexibility. Chief among these capabilities are transformation and routing.
  • Logging and Auditing focus on the ability to trace the life cycle of the service call. Logging and auditing typically require disk I/O (unless guaranteed retention of certain entries is not required) and therefore are expensive tasks that should be held to the minimum necessary to implement the non-functional requirements. Services need to be able to perform role-based logging "on-demand" or "on-error". "On-demand" logging is the ability to turn logging on or off from a management console without the need to restart the service. "On-error" logging is a feature by which the application logs only the errors in a very descriptive mode.
  • Availability Monitoring determines if a service is up and running. It can be implemented by a "ping" mechanism that periodically executes a dummy request or a "push" mechanism built into the service that periodically generates “heartbeat” event messages that can be monitored. Asynchronous push mechanisms work better in practice, as it minimizes polling, and the system can be designed to perform a "health check" before publishing the heartbeat.
  • Accessibility Monitoring determines if a service can be used. Just because a service is "available" does not mean it is "accessible." The lack of accessibility may be due to reasons such as an insufficient number of worker threads to handle the request under high load conditions, unavailable resources like a database, or inability to gain the cooperation of other requisite services.
  • Performance Monitoring profiles the execution of a service call and provides operational statistics. Its numbers measure both throughput and latency. Throughput measures the extent of usage of the service and determines scalability requirements. Latency is a measure of the round-trip time and can help identify bottleneck subcomponents or resources.
  • Resource Monitoring is the ability to monitor and record the usage of various consumable system resources under load, such as memory use or concurrent request counts.
  • Fault Monitoring is the ability to recognize and notify an operator when an application component has failed during request processing.
  • Notification is the ability to alert an operator to a problem that was discovered as a result of monitoring. Notification can be as simple as e-mail, or as complex as custom integration with a third-party network management system (NMS).
  • Probing is an active management component that initiates synthetic requests to trigger performance and availability monitoring. This often lets the system manager discover problems before users encounter or report them.
  • Analytics and Reporting is responsible for the collection of metrics from the individual managed resources, the computation of trends and other analytics, and the presentation of the resulting analysis and raw data to interested parties.

 

Request More Information