Srikanth Sasty's home page

I presented my recent work titled "Leader Election Using Loneliness Detection" at the 25th International Symposium on DIStributed Computing (DISC) in Rome last month. Briefly, the work focuses on the 'gap in the computational power' in single-hop wireless systems when the information about message loss in the system is varied.

We assume that the cause of message loss in wireless networks is collision, which happens when multiple nodes in the system transmit concurrently. We consider two models --- weak collision detection (WCD) and strong collision detection (SCD) --- which differ only with respect to the information that they provide about message loss. Specifically, when message collision occurs, in WCD systems, all the nodes that are not transmitting (and therefore listening) receive information about the collision whereas the transmitting processes do not receive any such information, and in SCD systems, all the nodes (both transmitting and receiving nodes) receive information about the collision. Intuitively, it makes sense to argue that SCD systems are more `powerful' than WCD systems because SCD systems provide more information about message collision than WCD systems; however, that does not answer the question: How much more powerful are SCD systems than WCD systems; how can the 'gap' be characterized? Alternatively, we may ask: how do we quantify the amount of information that is provided to transmitting processes when they are notified of a message collision?

We showed that the 'gap' between SCD and WCD systems is captured by the answer to the question: Is there exactly one node in the system? In other words, if WCD systems were to be augmented with an oracle that provides the answer to the foregoing question, then such a system would be able to solve all the problems solvable in SCD systems with the same time and message complexity (modulo a constant factor). We call such an oracle a Loneliness Detector (or LD, for short). We showed that LD can be implemented in WCD systems in O(log u - log n) time deterministically (where u is an upper bound on the number of nodes in the system and n is the actual number of nodes in the system) and in O(1) time with high probability. We then used LD to compare the time complexity of solving leader election in SCD and WCD systems. We showed that in both SCD and WCD systems, leader election may be solved in O(log u) time deterministically and in O(loglog n + log(1/epsilon)) time probabilistically, where epsilon is the error probability. We also provided matching lower bounds for each of the upper bounds presented, thus demonstrating the efficiency of our algorithms.

Abstract:

We consider the problem of leader election (LE) in single-hop radio networks with synchronized time slots for transmitting and receiving messages. We assume that the actual number n of processes is unknown, while the size u of the ID space is known, but is possibly much larger. We consider two types of collision detection: strong (SCD), whereby all processes detect collisions, and weak (WCD), whereby only non-transmitting processes detect collisions. We introduce loneliness detection (LD) as a key subproblem for solving LE in WCD systems. LD informs all processes whether the system contains exactly one process or more than one. We show that LD captures the difference in power between SCD and WCD, by providing an implementation of SCD over WCD and LD. We present two algorithms that solve deterministic and probabilistic LD in WCD systems with time costs of O(log u/n ) and O(min(log u/n , log(1/epsilon)/n)), respectively, where epsilon is the error probability. We also provide matching lower bounds. We present two algorithms that solve deterministic and probabilistic LE in SCD systems with time costs of O(log u) and O(min(log u, log log n + log( 1/epsilon ))), respectively, where epsilon is the error probability. We provide matching lower bounds.

The full version of the paper may be found here [link].

 

I successfully defended my dissertation titled "A Prescription For Partial Synchrony" on January 21st, 2011. The dissertation was approved and accepted by the Thesis Office on February 2nd, 2011.

Abstract:

Algorithms in message-passing distributed systems often require partial synchrony to tolerate crash failures. Informally, partial synchrony refers to systems where timing bounds on communication and computation may exist, but the knowledge of such bounds is limited. Traditionally, the foundation for the theory of partial synchrony has been real time: a time base measured by counting events external to the system, like the vibrations of Cesium atoms or piezoelectric crystals.

Unfortunately, algorithms that are correct relative to many real-time based models of partial synchrony may not behave correctly in empirical distributed systems. For example, a set of popular theoretical models, which we call M*, assume (eventual) upper bounds on message delay and relative process speeds, regardless of message size and absolute process speeds. Empirical systems with bounded channel capacity and bandwidth cannot realize such assumptions either natively, or through algorithmic constructions. Consequently, empirical deployment of the many M*-based algorithms risks anomalous behavior.

As a result, we argue that real time is the wrong basis for such a theory. Instead, the appropriate foundation for partial synchrony is {\em fairness}: a time base measured by counting events internal to the system, like the steps executed by the processes. By way of example, we redefine M* models with fairness-based bounds and provide algorithmic techniques to implement fairness-based M* models on a significant subset of the empirical systems. The proposed techniques use failure detectors --- system services that provide hints about process crashes --- as intermediaries that preserve the fairness constraints native to empirical systems. In effect, algorithms that are correct in M* models are now proved correct in such empirical systems as well.

Demonstrating our results requires solving three open problems. (1) We propose the first unified mathematical framework based on Timed I/O Automata to specify empirical systems, partially synchronous systems, and algorithms that execute within the aforementioned systems. (2) We show that crash tolerance capabilities of popular distributed systems can be denominated exclusively through fairness constraints. (3) We specify exemplar system models that identify the set of weakest system models to implement popular failure detectors.

I thank my advisors Scott Pike and Jennifer Welch for all their help, support, and guidance.

A copy of my dissertation (in PDF format) can be accessed here.

 

A new paper titled "Reliable Networks with Unreliable Sensors" has been published (and presented) at the 12th International Conference on Distributed Computing and Networking (ICDCN 2011) at Bangalore, India.

Abstract. Wireless sensor networks (WSNs) deployed in hostile environments suffer from a high rate of node failure. We investigate the effect of such failure rate on network connectivity. We provide a formal analysis that establishes the relationship between node density, network size, failure probability, and network connectivity. We show that as network size and density increase, the probability of network partitioning becomes arbitrarily small. We show that large networks can maintain connectivity despite a significantly high probability of node failure. We derive mathematical functions that provide lower bounds on network connectivity in WSNs. We compute these functions for some realistic values of node reliability, area covered by the network, and node density, to show that, for instance, networks with over a million nodes can maintain connectivity with a probability exceeding 99% despite node failure probability exceeding 57%.

A preprint of the paper can be access here [pdf].

 

The paper titled "Failure Detectors Encapsulate Fairness" was recently published (and presented) at the  14th International Symposium on Principles of Distributed Systems (OPODIS 2010) at Tozeur, Tunisia.

Abstract. Failure detectors are commonly viewed as abstractions for the synchronism present in distributed system models. However, investigations into the exact amount of synchronism encapsulated by a given failure detector have met with limited success. The reason for this is that traditionally, models of partial synchrony are specified with respect to real time, but failure detectors do not encapsulate real time. Instead, we argue that failure detectors encapsulate the fairness in computation and communication. Fairness is a measure of the number of steps executed by one process relative either to the number of steps taken by another process or relative to the duration for which a message is in transit. We argue that oracles are substitutable for the fairness properties (rather than real-time properties) of partially synchronous systems. We propose four fairness-based models of partial synchrony and demonstrate that they are, in fact, the ‘weakest systems models’ to implement the canonical failure detectors from the Chandra-Toueg hierarchy.

A preprint (in pdf) of the paper can be accessed here [link].

 

The paper "Failure Detectors Encapsulate Fairness" has been accepted as a brief announcement at the 24th International Symposium on Distributed Computing (DISC 2010).

The abstract:

We argue that failure detectors encapsulate fairness. Fairness is a measure of the number of steps a process takes relative to another processes and/or messages in transit. As evidence we specify fairness-based message-passing system models that are the weakest to implement the failure detectors from [1].

[1] Chandra, T. D. and Toueg, S. 1996. Unreliable failure detectors for reliable distributed systems. J. ACM 43, 2 (Mar. 1996), 225-267. DOI= http://doi.acm.org/10.1145/226643.226647

 
«1»2345
You are here:   Home