Papers /

Kant-MILCOM 2003

Reading

Outdoors

Games

Hobbies

LEGO

Food

Code

Events

Nook

sidebar

Kant-MILCOM 2003

Fault Localization and Self-Healing with Dynamic Domain Configuration

Kant, McAuley, Morera, Sethi, Steinder

fault detection network management self-healing manet

@inproceedings{kant:milcom-2003,
  title={Fault Localization and Self-Healing with Dynamic Domain Configuration},
  author={Kant, L. and McAuley, A. and Morera, R. and
          Sethi, AS and Steinder, M.},
  booktitle={Military Communications Conference ({MILCOM})},
  year={2003}
}

Survivability and automation are key targets for FCS

Fault management goals

  • Rapid localization (better than exponential)
  • Accurate (high detection, low false positive rates)
  • Automated recovery of (mission critical) applications
    • Notably, survivability requirements may vary; non-mission critical apps may be allowed to drop or be fixed more slowly
      • Must get input about these requirements
  • Low cost, low complexity for battlefield use

Dynamic domain configuration is also important

  • Automatically reconfiguring the network to suit dynamics, policies

Partition the network to better localize errors or isolate malfunctioning elements

Must work with multiple simultaneous faults, limited and noisy symptom observations

  • Must manage hard (e.g., hardware fault) and soft (e.g., QoS problems) failures

One approach: Capture dependencies between systems in multi-layer Bayesion model

  • Iterative belief updating on singly connected belief networks

Another approach: Incremental hypothesis updating; hypothesis is conjunction of faults explaining symptoms

Layer 1 automatic protection switching

  • Deals with hard failures
  • Restores quickly, but is resource expensive

Layer 2

  • Power control

Layer 3

  • On-demand survivability-based re-routing

Use dynamic domain configuration to:

  • Fully or partially isolate misbehaving nodes depending on level of malfunction, capabilities provided/needed
  • Address service problems, such as crashed, partitioned, or overloaded services
  • Isolate poor quality regions and apply different protocols, policies, etc
Recent Changes (All) | Edit SideBar Page last modified on October 06, 2010, at 01:31 PM Edit Page | Page History