Recovery Action Design Pattern
Intent
Decouple the management of individual failure recovery actions from their implementation by allowing recovery actions to be manipulated as abstract entities independent of the specific recovery actions they implement.
Based On
This pattern is derived from the failure recovery design pattern proposed by the AOCS Framework (see also: A. Pasetti, Embedded Control Systems and Software Frameworks, Springer-Verlag, 2002).
Motivation
Many OBS are capable of performing a certain amount of failure detection checks. The detection of a failure or suspected failure may lead to the execution of a recovery action. Thus, on-board systems often specify a number of failure detection checks and associate to each one or more recovery actions to be executed when the check fails.
The type of actions to be executed in response to a given failure obviously varies across applications but the way these actions are managed presents some similarities. Thus, in most cases, there is a requirement that it be possible to disable and enable individual recovery actions; that it be possible to react only to consecutive occurrences of the same action; that execution of the recovery action be recorded as an event; etc. This design pattern allows these commonalities to be factored out by encapsulating recovery actions in objects that are indirectly instantiated from a base class that implements the invariant recovery action operations.Dictionary Entries
The following abstractions or domain-wide concepts are defined to support the implementation of this design pattern:
Structure
The recovery action design pattern represents the recovery action abstraction as an
abstract interface RecoveryAction
that defines the generic operations
that can be performed on a generic recovery action. Concrete recovery actions are
implemented as instances of classes that implement RecoveryAction
.
Recovery actions are therefore plug-in components and components that must execute
them only see them as instances of the abstract type RecoveryAction
.
Participants
Client
:The component that executes the recovery action or performs housekeeping operations (e.g. disabling and enabling) on it. RecoveryAction
:The abstract interface or base abstract class that defines the basic operations that can be performed on generic recovery actions. ConcreteRecoveryAction
:Component implementing (or derived from) RecoveryAction
that represents a specific and concrete recovery action. At a minimum, it must provide an implementation for thedoRecovery
operation. Other base operations could in principle be inherited from an abstractRecoveryAction
base class.
Collaborations
Typical operational scenarios for this design pattern are:
- A component that may need to execute a recovery action, is loaded the recovery action component that implements it (as an instance of type
RecoveryAction
) and, when the conditions for the execution occur, calls itsdoRecovery
method - A component that executes a command to disable or enable a recovery action, holds a reference to the recovery action (which is sees as an instance of type
RecoveryAction
) and, when the telecommand must be executed, calls the disable or enable method on it.
Consequences
- Clients are decoupled from the implementation of recovery actions: they only see abstract recovery actions and only interact with them through the same interface. Changing the concrete recovery action that is associated to a certain component has no impact on it.
-
Functionalities that are common to all recovery actions (e.g. the management of the enable/disable status) can be placed in the base
RecoveryAction
class and can be coded only once. - Linked lists of recovery actions can be treated as if they were one single recovery action: the client is not - and need not be - aware of whether it is executing one single or several recovery actions.
- It is possible to build a library of commonly recurring recovery actions and to use them within an application as ready-made components.
- It is necessary to have a dedicated class for each concrete recovery action required by an application. This may lead to a proliferation of small classes.
Applicability
This design pattern is useful when:
- components in an application need to execute and handle recovery actions
- it is necessary to be able to vary the implementation of the recovery actions without affecting the components that execute or handle them
Implementation Issues
Conceptually, RecoveryAction
is an abstract interface but instantiation of the pattern will often implement it as a base abstract class that provides concrete implementations for its housekeeping operations and leaves only doRecovery
as an abstract operation to be defined in concrete subclasses.
Which operations should be defined at the level of RecoveryAction
? The class diagram of the pattern considers only three types of operation but one might conceivably want to implement more (or less). For instance, in some on-board applications, failures that are detected only once (or only a small number of times) are treated differently from failures that recur in several consecutive operating cycles. The corresponding logic could be placed in a RecoveryAction
base class. Similarly, execution of a recovery action should sometimes
be recorded as an Event. The logic to create the event report could again be placed in a RecoveryAction
base class.
Since they are encapsulated in objects, recovery actions can have memory. Thus, it is possible to make the execution of a recovery action conditional upon past executions. Consider for instance the case where recovery should only be performed if a certain failure conditions persist for two consecutive cycles (this is often done to avoid triggering of recovery actions in response to detection of spurious failures). In such a case, the simplest mechanism is to have a recovery action that returns without performing any action the first time it is called and that only executes some concrete action after it is called twice in a row (indicating that the failure is persistent).
The sample definition of interface RecoveryAction
given in the class diagram of the design pattern, foresees methods to
enable and disable individual recovery actions. There is sometimes a need to disable or enable all recovery actions. This
type of requirement can be implemented by having static enable/disable methods.
In the concept proposed here, a recovery action is a punctual action that is executed in one-shot immediately after the fault has been detected. In some cases, however, the response to a fault must consist of a sequence of actions that may extend over several activation cycles. In such a case, the sequence of actions should be encapsulated in a manoeuvre and the recovery action will consist in loading the manoeuvre into the manoeuvre manager.
OBS Framework Mapping
The implementation of this design pattern in the OBS Framework is supported by the following classes:
- RecoveryActionabstract interface -->
RecoveryAction
Sample Code
Consider a recovery action associated to the detection of a transmission bus fault that specifies that, when the fault is detected, there should be a switchover to the redundant bus if the fault is sporadic or a fall-back to SBY mode if the fault is permanent. The fault is defined to be permament if it has occurred more than once.
Use of the recovery action design pattern implies that a dedicated class be defined to encapsulate this recovery action. A tentative implementation for the doRecovery
method for this class could be as follows:
class BusFaultRecoveryAction : RecoveryAction { bool alreadyTried=false; . . . void doRecovery() { if (!alreadyTried) // sporadic fault { . . . // do switch over to redundant bus alreadyTried = true; } else . . . // command fall-back to SBY } }The component that implements the bus fault check would then look like this:
Class BusManager { RecoveryAction* busFaultRecoveryAction; . . . // Method to load recovery action as plug-in component void loadBusFaultRecoveryAction(RecoveryAction* ra) { busFaultRecoveryAction = ra; } // Method to perform the APS fault check void doBusFaultCheck { if (bus fault detected) busFaultRecoveryAction.doRecovery(); } . . . }This component sees the recovery action as a plug-in component that is loaded when the component is configured during the initialization phase. Consequently, its code is independent of which specific recovery action is executed in response to the bus fault. It is also independent of whether the fault is sporadic or permanent. The management of the sporadic/permanent status is done internally to the recovery action.
The configuration code for such a component could be as follows:
BusFaultRecoveryAction* busFaultRecoveryAction; . . . BusManager* busManager = new BusManager(); BusManager->loadBusFaultRecoveryAction(busManager);Note that the recovery action is created as an instance of a specific recovery action class but is loaded into the client component as an instance of the generic abstract class
RecoveryAction
.
As already mentioned earlier, the management of the sporadic/permanent status could be done at the level of an abstract base class RecoveryAction
of the following kind:
class RecoveryAction { int limit; int counter=0; bool isSporadic=true; . . . void setLimit(int l) { limit=l; } bool isSporadic() { return isSporadic; } void doRecovery() { limitCounter++; if (limitCounter>limit) { isSporadic=false; limitCounter=0; } } . . . }The implementation of the
doRecovery
method in a derived class would then be as follows:
class ConcreteRecoveryAction : RecoveryAction { . . . void doRecovery() { RecoveryAction:doRecovery(); if (isSporadic()) . . . // execute sporadic part of recovery action else . . . // execute permanent part of recovery action } }As a final example, consider the recovery action that requires execution of a complex manoeuvre extending over a prolonged period of time. This recovery action cannot be executed by a
RecoveryAction
object that, by definition, is activated only once and can only perform punctual actions within the same operational cycle. This type of situation can be handled
by defining a Manoeuvre that is responsible for performing the recovery procedure and by having the recovery action load the manoeuvre into the manoeuvre manager. The corresponding RecoveryAction
class can then be defined as follows:
class ComplexRecoveryAction : RecoveryAction { Manoeuvre* recoveryProcedure; ManoeuvreManager* manoeuvreManager; . . . void doRecovery() { . . . manoeuvreManager->add(recoveryProcedure); } }The loading of the manoeuvre in a sense allows the recovery action to extend the range of its action beyond the cycle where the fault was detected.
Note that the recoveryProcedure
manoeuvre can also be used in contexts other than the execution of the recovery action.
Remarks
None
Author
A. Pasetti (P&P Software)
Last Modified
2003-05-19