Solvability-based comparison of failure detectors

New paper in NCA 2014, with Josef Widder.

Abstract: Failure detectors are oracles that have been introduced to provide processes in asynchronous systems with information about faults. This information can then be used to solve problems otherwise unsolvable in asynchronous systems. A natural question is on the “minimum amount of information” a failure detector has to provide for a given problem. This question is classically addressed using a relation that states that a failure detector D is stronger (that is, provides “more, or better, information”) than a failure detector D’ if D can be used to implement D’. It has recently been shown that this classic implementability relation has some drawbacks. To overcome this, different relations have been defined, one of which states that a failure detector D is stronger than D’ if D can solve all the time-free problems solvable by D’. In this paper we compare the implementability-based hierarchy of failure detectors to the hierarchy based on solvability. This is done by introducing a new proof technique for establishing the solvability relation. We apply this technique to known failure detectors from the literature and demonstrate significant differences between the hierarchies.

A preprint is available on ArXiv; click here to access it.

Wait-Free Stabilizing Dining Using Regular Registers

This paper was presented at OPODIS 2012.

Abstract: Dining philosophers is a scheduling paradigm that determines when processes in a distributed system should execute certain sections of their code so that processes do not execute `conflicting’ code sections concurrently, for some application-dependent notion of a `conflict’. Designing a stabilizing dining algorithm for shared-memory systems subject to process crashes presents an interesting challenge: classic stabilization relies on all processes continuing to execute actions forever, an assumption which is violated when crash failures are considered. We present a dining algorithm that is both wait-free (tolerates any number of crashes) and is pseudo-stabilizing. Our algorithm works in an asynchronous system in which processes communicate via shared regular registers and have access to the eventually perfect failure detector $\Diamond P$. Furthermore, with a stronger failure detector, the solution becomes wait-free and self-stabilizing. To our knowledge, this is the first such algorithm. Prior results show that $\Diamond P$ is necessary for wait-freedom.

Leader Election Using Loneliness Detection

My recent work titled “Leader Election Using Loneliness Detection” is accepted for presentation at the 25th International Symposium on DIStributed Computing (DISC) in Rome, and in the journal Distributed Computing. Briefly, the work focuses on the ‘gap in the computational power’ in single-hop wireless systems when the information about message loss in the system is varied.

Continue reading “Leader Election Using Loneliness Detection”

Asynchronous Failure Detectors

My latest work on a modeling framework for a special variant of failure detectors is accepted at the 31st Annual ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing. A preprint of the submission is available [here].

Update: The full version is available as a tech report [here].

Continue reading “Asynchronous Failure Detectors”