Solvability-based comparison of failure detectors

01 Oct 2014

New paper in NCA 2014, with Josef Widder.

Abstract: Failure detectors are oracles that have been introduced to provide processes in asynchronous systems with information about faults. This information can then be used to solve problems otherwise unsolvable in asynchronous systems. A natural question is on the "minimum amount of information" a failure detector has to provide for a given problem. This question is classically addressed using a relation that states that a failure detector D is stronger (that is, provides "more, or better, information") than a failure detector D' if D can be used to implement D'. It has recently been shown that this classic implementability relation has some drawbacks. To overcome this, different relations have been defined, one of which states that a failure detector D is stronger than D' if D can solve all the time-free problems solvable by D'. In this paper we compare the implementability-based hierarchy of failure detectors to the hierarchy based on solvability. This is done by introducing a new proof technique for establishing the solvability relation. We apply this technique to known failure detectors from the literature and demonstrate significant differences between the hierarchies.

A preprint is available on ArXiv; click here to access it.

Stabilizing Dining with Failure Locality 1

04 Feb 2014

This paper was presented in ICDCN 2014.

Abstract: We propose two algorithm for solving self-stabilizing dining with Crash Locality 1 in asynchronous shared-memory systems with safe registers. Since this problem cannot be solved in pure asynchrony, we augment the shared-memory system with failure detectors. Specifically, we introduce the anonymous eventually perfect failure detector ?<>P (a variant of the anonymous perfect failure detector introduced by Guerraoui et al.), and show that this failure detector is sufficient to solve the problem at hand.

The preprint is available for download here.

Moving to Google

23 Apr 2013

Today, I start my new life as a software engineer with Google. It feels strange moving away from academia after over 9 years. Hope this bodes well for me! :-)

Wait-Free Stabilizing Dining Using Regular Registers

18 Jan 2013

This paper was presented at OPODIS 2012.

Abstract: Dining philosophers is a scheduling paradigm that determines when processes in a distributed system should execute certain sections of their code so that processes do not execute conflicting' code sections concurrently, for some application-dependent notion of a conflict’. Designing a stabilizing dining algorithm for shared-memory systems subject to process crashes presents an interesting challenge: classic stabilization relies on all processes continuing to execute actions forever, an assumption which is violated when crash failures are considered. We present a dining algorithm that is both wait-free (tolerates any number of crashes) and is pseudo-stabilizing. Our algorithm works in an asynchronous system in which processes communicate via shared regular registers and have access to the eventually perfect failure detector $\Diamond P$. Furthermore, with a stronger failure detector, the solution becomes wait-free and self-stabilizing. To our knowledge, this is the first such algorithm. Prior results show that $\Diamond P$ is necessary for wait-freedom.

Leader Election Using Loneliness Detection

11 Oct 2012

My recent work titled “Leader Election Using Loneliness Detection” is accepted for presentation at the 25th International Symposium on DIStributed Computing (DISC) in Rome, and in the journal Distributed Computing. Briefly, the work focuses on the ‘gap in the computational power’ in single-hop wireless systems when the information about message loss in the system is varied.

We assume that the cause of message loss in wireless networks is collision, which happens when multiple nodes in the system transmit concurrently. We consider two models — weak collision detection (WCD) and strong collision detection (SCD) — which differ only with respect to the information that they provide about message loss. Specifically, when message collision occurs, in WCD systems, all the nodes that are not transmitting (and therefore listening) receive information about the collision whereas the transmitting processes do not receive any such information, and in SCD systems, all the nodes (both transmitting and receiving nodes) receive information about the collision. Intuitively, it makes sense to argue that SCD systems are more `powerful’ than WCD systems because SCD systems provide more information about message collision than WCD systems; however, that does not answer the question: How much more powerful are SCD systems than WCD systems; how can the ‘gap’ be characterized? Alternatively, we may ask: how do we quantify the amount of information that is provided to transmitting processes when they are notified of a message collision?

We showed that the ‘gap’ between SCD and WCD systems is captured by the answer to the question: Is there exactly one node in the system? In other words, if WCD systems were to be augmented with an oracle that provides the answer to the foregoing question, then such a system would be able to solve all the problems solvable in SCD systems with the same time and message complexity (modulo a constant factor). We call such an oracle a Loneliness Detector (or LD, for short). We showed that LD can be implemented in WCD systems in O(log u - log n) time deterministically (where u is an upper bound on the number of nodes in the system and n is the actual number of nodes in the system) and in O(1) time with high probability. We then used LD to compare the time complexity of solving leader election in SCD and WCD systems. We showed that in both SCD and WCD systems, leader election may be solved in O(log u) time deterministically and in O(loglog n + log(1/epsilon)) time probabilistically, where epsilon is the error probability. We also provided matching lower bounds for each of the upper bounds presented, thus demonstrating the efficiency of our algorithms.

Abstract:

We consider the problem of leader election (LE) in single-hop radio networks with synchronized time slots for transmitting and receiving messages. We assume that the actual number n of processes is unknown, while the size u of the ID space is known, but is possibly much larger. We consider two types of collision detection: strong (SCD), whereby all processes detect collisions, and weak (WCD), whereby only non-transmitting processes detect collisions. We introduce loneliness detection (LD) as a key subproblem for solving LE in WCD systems. LD informs all processes whether the system contains exactly one process or more than one. We show that LD captures the difference in power between SCD and WCD, by providing an implementation of SCD over WCD and LD. We present two algorithms that solve deterministic and probabilistic LD in WCD systems with time costs of O(log u/n ) and O(min(log u/n , log(1/epsilon)/n)), respectively, where epsilon is the error probability. We also provide matching lower bounds. We present two algorithms that solve deterministic and probabilistic LE in SCD systems with time costs of O(log u) and O(min(log u, log log n + log( 1/epsilon ))), respectively, where epsilon is the error probability. We provide matching lower bounds.

The full version of the paper may be found here [link].

Older Newer

Srikanth Sastry A Techie in Boston

Solvability-based comparison of failure detectors

Stabilizing Dining with Failure Locality 1

Moving to Google

Wait-Free Stabilizing Dining Using Regular Registers

Leader Election Using Loneliness Detection