«A Design Strategy for Deadlock-Free Concurrent Systems J.M.R.Martin Oxford University Computing Services, 13 Banbury Road, Oxford OX2 6NN, UK ...»
TRANSPUTER COMMUNICATIONS, VOL. 3(3), 1–18 (AUGUST 1997)
A Design Strategy for Deadlock-Free
Oxford University Computing Services, 13 Banbury Road, Oxford OX2 6NN, UK
Computing Laboratory, The University, Canterbury, Kent, CT2 7NF, UK
When building concurrent systems, it would be useful to have a collection of reusable processes
to perform standard tasks. However, without knowing certain details of the inner workings of these components, one can never be sure that they will not cause deadlock when connected to some particular network.
Here we describe a hierarchical method for designing complex networks of communicating processes which are deadlock-free. We use this to deﬁne a safe and simple method for specifying the communication interface to third party software components. This work is presented using the CSP model of concurrency and the occam2.1 programming language.
KEY WORDS deadlock livelock design concurrency reuse components networks CSP occam
Deadlock is a major problem with parallel programming. Safety critical systems must be guaranteed deadlock-free, but this is not an easy task. In  and  it is shown how to tackle this problem using software design rules. By placing minor restrictions on the manner in which concurrent systems are constructed, it is possible to remove completely the spectre of deadlock. Proofs of deadlock-freedom for networks of arbitrary size are reduced to simple checks for local conditions of processes, and topological properties of the network conﬁguration.
In this paper we shall build on that work to show how deadlock-free networks may be built hierarchically, incorporating reusable components. This provides an efﬁcient method for modular programming in occam2.1 that offers security guarantees that are very difﬁcult to obtain from traditional approaches to concurrent system design.
CCC 1070–454X/97/030001–18 Received 7 March 1996 c 1997 by John Wiley & Sons, Ltd. Revised 15 January 1997 2 J.M.R. MARTIN and P.H.WELCH In section 2 we describe the deadlock analysis terminology of Roscoe and Dathi, which is built upon the ﬁrm mathematical foundations of CSP. In section 3 we present a formal deﬁnition of the client-server protocol which has previously existed only as a collection of informal rules, potentially open to misinterpretation. We then prove a theorem about how this protocol may be used to build deadlock-free networks and illustrate this by two examples: an occam2.1 process farm and a message router. In section 4 we show how to use the client-server protocol to build deadlock-free networks hierarchically from composite sub-components. This is important for the design of large systems.
There are certain other design techniques which are known to yield deadlock-free concurrent systems. In section 5 we describe a technique for integrating subnetworks built by different methods into a client-server system. Section 6 looks at the related issues of divergence-freedom and starvation-freedom and section 7 describes ongoing research.
2 FORMAL ANALYSIS OF DEADLOCK PROPERTIESThe deadlock problem has proved a popular area of research for many years. The early results are largely informal, and suffer from the lack of a suitable underlying mathematical model. This problem was remedied in 1986 by A.W.Roscoe and N.Dathi  who provided a powerful suite of tools for deadlock analysis, based on CSP – the mathematical language for reasoning about concurrent systems . This enabled much of the earlier work to be formalised. We shall now describe their terminology.
With each CSP process P is associated an alphabet, P, the set of actions that P may perform. The observable behaviour of P is described by its failures and divergences. A failure of P consists of a pair (s, X) where s is a possible trace of events that P may perform, and X is a refusal set of P=s – a set of events which if offered to P after it has performed trace s, might be completely refused. A divergence of P is a trace s after which P may indulge in an inﬁnite series of concealed actions. A glossary of CSP operators used in this paper is given in Figure 1.
Of particular importance for deadlock analysis are the maximal refusal sets of a process, as the more events that are refused the more likely is deadlock to occur. We deﬁne the set of maximal failures of P as those failures (s, X) of P where the refusal set X is maximal, i.e.
After it has performed event a, Q can refuse fg, fag, fbg, fcg, fa, bg, or fa, cg. Its maximal refusals at this stage are fa, bg and fa, cg, which correspond to the two states that the process may be in once it has decided which branch of the u construct to take: either it decides to behave like b ! Q or it decides to behave like c ! Q. These are called stable states of Q because once they are entered no internal activity is possible, at least until an external event has been performed. Before making the decision as to which branch of the u construct to take, Q is said to be in an unstable state. For the purpose of deadlock analysis, unstable
states are irrelevant, as a system in such a state is yet to come to rest – when a system is deadlocked there can be no activity at all.
Consider a network V of CSP processes, hP1, : :, PN i, composed in parallel. Events which lie in the alphabet of two processes of V require their simultaneous participation, and this is the means of communication within the network. It is assumed that the network is tripledisjoint – i.e. there is no event which is shared by more than two process alphabets. It is also assumed that the network is busy – i.e. each process Pi is non-terminating and individually free of both deadlock and divergence. (It is reasonably straightforward to extend the model to allow for process termination, if required. See  for details.) The behaviour of a network is described in terms of a set of states that it may be in. A state of V consists of a trace of events s together with a tuple of refusal sets hX1, : :, XN i such that, for all i, (s j` Pi, Xi ) is a maximal failure of Pi. It represents a set of events in which each process may refuse to engage after they have collectively performed the sequence of events in s.
In a deadlock state every event is refused by some process, i.e.
A network which has no deadlock states is said to be deadlock-free.
In any state = (s, hX1, : :, XN i), if there is a process Pi which is ready to perform an event (or events) which lie in the vocabulary of another process Pj (i.e. ( Pi ; Xi ) \ Pj 6= fg), we say that Pi is making a request of Pj, and write
When a process has all its requests ungranted we say that it is blocked. In a deadlock state every process is blocked.
3 CLIENT-SERVER NETWORKS
3.1 Client-Server Protocol A particularly ﬂexible design rule for deadlock freedom is the Client-Server Protocol. This was originally formulated by P. Brinch Hansen  in the context of operating systems. It has since been adapted as a means of designing deadlock-free concurrent systems using occam2 (see ).
The basic idea is that a process communicates on each of its channels either as a client or as a server, according to a strict protocol. A network of client-server processes is deadlock-free if it has no cycle of client-server relationships.
The standard deﬁnition of the client-server protocol is both informal and intuitive. It runs as follows. If a client is waiting to communicate with a server, the server, ignoring possible indeﬁnite delays whilst waiting to communicate as a client with another process, must eventually become ready to participate in that communication.
4/3/1997 16:59 PAGE PROOFS paper
A DESIGN STRATEGY FOR DEADLOCK-FREE CONCURRENT SYSTEMSUnfortunately this is too vague a condition to translate into a local speciﬁcation of CSP processes. For instance consider a process that acts only as a server. In order to establish the above property we would need to show that any particular client request would eventually be granted. To do this we would need to analyse the way in which the process interacts with each of its other clients and possibly also how they interact with each of their clients in turn. Here we introduce a formal deﬁnition which is slightly more restrictive than the informal one, yet with the distinct advantage of being purely a property of an individual process, independent of any network in which it is embedded. This is more suitable for collaborative software projects and makes automatic checking much more feasible.
A basic client-server CSP process P has a ﬁnite set of external channels partitioned into bundles, each of which has a type, in relation to P, which is either client or server. We write the set of client bundles of P as clients(P), and the server bundles as servers(P).
Each channel bundle consists of either a pair of channels, a request and an acknowledgement, hr, ai, or a single channel (which we call a drip) hdi. This allows client-server conversations to be either one way or two way. In principle there is no reason why a bundle should not contain three or more channels. In practice we have found that two channels are generally sufﬁcient, and have restricted the deﬁnition as such.
In the subsequent analysis, a communication event on a channel is represented purely by the channel name, ignoring any data that is passed. The purpose of this is clarity and simplicity. Following this convention, a basic client-server process, P, must obey the
(a) P is divergence-free, deadlock-free and non-terminating – it can never refuse to perform every event in its alphabet, i.e.
(b) When P is in a stable state, either it is ready to communicate on all its request and drip server channels or it is ready to communicate on none of them. In CSP terms, this means that the refusal sets in the maximal failures of P include either none of its request and drip server channels or all of them, i.e.
The formal statement of these rules is rather daunting, but it should be possible to design conformant processes using only the informal descriptions which accompany them. Work is in progress to develop software engineering tools (see Section 7) that will check for adherence to the formal model, for total assurance.
When we construct a client-server network V from a set of client-server processes hP1, : :,PN i, each client bundle of a process must either be a server bundle of exactly one other process, or consist of channels external to the network. Similarly each server bundle of any process must either be a client bundle of exactly one other process or be external to the network. No other communication between processes is permitted.
The client-server digraph of a client-server network consists of a vertex representing every process, and, an arc representing each shared bundle, directed from the process for which it is of type client, towards that for which it is of type server.
Theorem 1 (Client-Server Theorem) A client-server network, composed from basic clientserver processes, which has a circuit-free1 client-server digraph, is deadlock-free.
Proof. First we observe that the matching requirements for client and server bundles within a network enforce triple-disjointedness within a client-server network. This, coupled with rule (a), ensures that client-server networks conform to the model of Roscoe and Dathi.
Then we use induction on the number of processes in the network. Suppose that the theorem holds for any network with N processes. Consider a client-server network V = hP1, : :, PN+1i, which has a circuit-free client-server digraph, yet has a deadlock state.
Without loss of generality we assume that PN+1 is maximal in the client-server ordering of V. Every process is blocked in state, and therefore is making an ungranted request to another process. Because of its maximality PN+1 can only make ungranted requests on server channels. By rules (c) and (d) these cannot be acknowledge channels – they can only be request or drip channels. By rule (b) it follows that P N+1 is making an ungranted request on every one of its request and drip channels. So there is no process in the subnetwork V 0 = hP1, : :, PN i that is ready to communicate with PN+1 in state. Each process in V0 is in fact blocked only by other processes in V 0, which implies that the subnetwork is itself deadlocked. This contradicts our hypothesis that the theorem holds for networks of size N.
It follows that V must be deadlock-free. Clearly the theorem holds in the case when N = 1, so by induction it holds for all N. Q.E.D.
3.2 Examples A Simple Process Farm We consider an application where computing-intensive tasks are performed in parallel using a standard farm network conﬁguration. A farmer employs n foremen each of whom is responsible for m workers. The program is coded in occam2.12 as shown in Figure 2.
When each WORKER (i, j) becomes idle, it reports the result of any work done to its FOREMAN (i) using channel a[i][j]. FOREMAN (i) reports this on channel c[i] to the FARMER who, in turn, replies with a new task using channel d[i]. FOREMAN (i) then assigns
the new task to WORKER (i, j) with channel b[i][j]. Here the relationships between each WORKER and its FOREMAN and between each FOREMAN and the FARMER are all client-toserver (see Figure 3).