# «Abstract. Automated reasoning about systems with inﬁnite domains requires an extension of regular automata to inﬁnite alphabets. Existing ...»

Variable Automata over Inﬁnite Alphabets

Orna Grumberg1, Orna Kupferman2, and Sarai Sheinvald2

Department of Computer Science, The Technion, Haifa 32000, Israel

School of Computer Science and Engineering, Hebrew University, Jerusalem 91904, Israel

Abstract. Automated reasoning about systems with inﬁnite domains requires an

extension of regular automata to inﬁnite alphabets. Existing formalisms of such

automata cope with the inﬁniteness of the alphabet by adding to the automaton a set of registers or pebbles, or by attributing the alphabet by labels from an auxiliary ﬁnite alphabet that is read by an intermediate transducer. These formalisms involve a complicated mechanism on top of the transition function of automata over ﬁnite alphabets and are therefore difﬁcult to understand and to work with.

We introduce and study variable ﬁnite automata over inﬁnite alphabets (VFA).

VFA form a natural and simple extension of regular (and ω-regular) automata, in which the alphabet consists of letters as well as variables that range over the inﬁnite alphabet domain. Thus, VFAs have the same structure as regular automata, only that some of the transitions are labeled by variables. We compare VFA with existing formalisms, and study their closure properties and classical decision problems. We consider the settings of both ﬁnite and inﬁnite words. In addition, we identify and study the deterministic fragment of VFA. We show that while this fragment is sufﬁciently strong to express many interesting properties, it is closed under union, intersection, and complementation, and its nonemptiness and containment problems are decidable. Finally, we describe a determinization process for a determinizable subset of VFA.

1 Introduction Automata-based formal methods are successfully applied in automated reasoning about systems. When the systems are ﬁnite-state, their behaviors and speciﬁcations can be modeled by ﬁnite automata. When the systems are inﬁnite-state, reasoning is undecidable, and research is focused on identifying decidable special cases (e.g., pushdown systems) and on developing heuristics (e.g., abstraction) for coping with the general case.

One type of inﬁnite-state systems, motivating this work, are systems in which the control is ﬁnite and the source of inﬁnity is data. This includes, for example, software with integer parameters [3], datalog systems with inﬁnite data domain [15, 4], and XML documents, whose leaves are typically associated with data values from some inﬁnite domain [7, 5]. Lifting automata-based methods to the setting of such systems requires the introduction of automata with inﬁnite alphabets. 3 The transition function of a nondeterministic automaton over ﬁnite alphabets (NFA) maps a state q and a letter σ to a set of states the automaton may move to when it is in Different approaches for automatically reasoning about such systems are based on extensions of ﬁrst-order logic [2] and linear temporal logics [8].

2 O. Grumberg, O. Kupferman, and S. Sheinvald state q and the letter in the input is σ. When the alphabet of the automaton is inﬁnite, specifying all transitions is impossible, and a new formalism is needed in order to represent them in a ﬁnite manner. Existing formalisms of automata with inﬁnite alphabets fulﬁll this task by augmenting the automaton by registers or pebbles, or by attributing the alphabet by labels from an auxilary ﬁnite alphabet that is read by an intermediate transducer. We elaborate of the existing formalisms below.

A register automaton [13] has a ﬁnite set of registers, each of which may contain a letter from the inﬁnite alphabet. The transitions of a register automaton compare the letter in the input with the content of the registers, and may also store the input letter in a register. Several variants of this model have been studied. For example, [10] forces the content of the registers to be different, [12] adds alternation and two-wayness, and [9] allows the registers to change their content nondeterministically during the run.

A pebble automaton [12] places pebbles on the input word in a stack-like manner.

The transitions of a pebble automaton compare the letter in the input with the letters in positions marked by the pebbles. Several variants of this model have been studied. For example, [12] studies alternating and two-way pebble automata, and [14] introduces top-view weak pebble automata.

The newest formalism is data automata [2, 1]. For an inﬁnite alphabet Σ, a data automaton runs on data words, which are words over the alphabet Σ × F, where F is a ﬁnite auxilary alphabet. Intuitively, the ﬁnite alphabet is accessed directly, while the inﬁnite alphabet can only be tested for equality, and is used for inducing an equivalence relation on the set of positions. Technically, a data automaton consists of two components. The ﬁrst is a letter-to-letter transducer that runs on the projection of the input word on F and generates words over yet another alphabet Γ. The second is a regular automaton that runs on subwords (determined by the equivalence classes) of the word generated by the transducer.

The quality of a formalism is measured by its simplicity, expressive power, compositionality, and computability. In simplicity, we refer to the effort required in order to understand a given automaton, work with it, and implement it. In compositionality, we refer to closure under the basic operations of union, intersection, and complementation.

In computability, we refer to the decidability and complexity of classical problems like nonemptiness, membership, universality, and containment.

The formalisms of register, pebble, and data automata all fail hard the simplicity criterion. Augmenting NFAs with registers or pebbles requires a substantial modiﬁcation of the transition function. The need to maintain the registers and pebbles makes the automata hard to understand and work with. Unfortunately, most researchers in the formal-method community are not familiar with register and pebble automata. Indeed, even the deﬁnition of the basic notion of a run of such automata cannot simply rely on the familiar deﬁnition of a run of an NFA, and involves the notions of conﬁgurations, successive conﬁgurations, and so on, with no possible shortcuts.

Data automata do not come to the rescue. The need to accept several subwords per input word and to go through an intermediate alphabet and transducer makes them very complex. Even trivial languages such as a∗ require extra letters and checks in order to be recognized. Simplicity is less crucial in the process of automatic algorithms, and indeed, data automata have been succesfully used for the decidability of two-variable Variable Automata over Inﬁnite Alphabets 3 ﬁrst order logic on words with data - a formalism that is very useful in XML reasoning [2, 1]. For the purpose of speciﬁcation and design, and for developing new algorithms and applications, simplicity is crucial. A simpler, friendlier formalism is needed.

Data and register automata and most of their variants fail the compositionality and computability criteria too. Data automata and register automata are not closed under complementation, apart from speciﬁc fragments of register automata that limit the number of registers [8]. Their universality and containment problems are undecidable [12].

Pebble automata and most of their variants fail the computability criterion, as apart from weaker models [14], their nonemptiness, universality, and containment problems are undecidable. Nonemptiness of data and register automata is decidable, but is far more complex than the easy reachability-based nonemptiness algorithm for NFAs.

We introduce and study a new formalism for recognizing languages over inﬁnite alphabets. Our formalism, variable ﬁnite automata (VFA), forms a natural and simple extension of NFAs. We also identify and study a fragment of VFA that fulﬁlls the simplicity, compositionality, and computability criteria, and is still sufﬁciently expressive to specify many interesting properties. Intuitively, a VFA is an NFA some of whose letters are variables ranging over the inﬁnite alphabet. The tight connection with NFAs enables us to apply much of the constructions and algorithms known for them.

More formally, a VFA is a pair A = Σ, A, where Σ is an inﬁnite alphabet and A is an NFA, referred to as the pattern automaton of A. The alphabet of A consists of constant letters – a ﬁnite subset of Σ, a set of bounded variables, and a single free variable. The language of A consists of words in Σ ∗ that are formed by assigning letters in Σ to the occurrences of variables in words in the language of A. Each bounded variable is assigned a different letter (also different from the constant letters), thus all occurrences of a particular bounded variable must be assigned the same letter. This allows describing words in Σ ∗ in which some letter is repeated. The free variable may be assigned different letters in every occurrence, different from the constant letters and from letters assigned to the bounded variables. This allows describing words in which every letter may appear. For example, consider a VFA A = N, A, where A has a bounded variable x and its free variable is y. if the language of A is (x + y)∗ · x · (x + y)∗ · x · (x + y)∗, then the language of A consists of all words over N in which at least some letter occurs at least twice.

We prove that VFAs are closed under union and intersection. The constructions we present use the union and product constructions for NFAs in their basis, but some pirouettes are needed in order to solve conﬂicts between different assignments to the variables of the underlying automata. Such pirouettes are helpless for the problem of complementation, and we prove that VFAs are not closed under complementation. We study the classical decision problems for VFAs. We show that a VFA is nonempty iff its pattern automaton is nonempty. Thus, the nonemptiness problem is NL-complete, and is not more complex than the one for NFAs. We also show that the membership problem is NP-complete. Thus, while the problem is more complex than the one for NFAs, it is still decidable. The universality and containment problems, however, are undecidable.

We then deﬁne and study deterministic VFA (DVFA), a fragment of VFA in which there exists exactly one run on every word. Unlike the case of DFAs, determinism is not a syntactic property. Indeed, since the variables are not pre-assigned, there may be 4 O. Grumberg, O. Kupferman, and S. Sheinvald several runs on a word even when the pattern automaton is deterministic. However, a syntactic deﬁnition does exist and deciding whether a given VFA is deterministic is NLcomplete. We introduce an unwinding operator for VFAs. In an unwinded VFA, each state is labeled by the variables that have been read, and therefore assigned, in paths leading to the state. Using the unwinding operator, we can deﬁne DVFAs for the union and intersection of DVFAs. Moreover, the closure under complementation of DVFAs is immediate, and it enables us to solve the universality and containment problems for DVFAs. Thus, DVFAs suggest an expressive formalism that fulﬁlls the three criteria.

We study further properties of DVFA. As bad news, we show that the problem of determinizing a given VFA (or concluding that no equivalent DVFA exists) is undecidable. As good news, we show that all VFAs with no free variable have an equivalent DVFA, and present a determinization process for VFAs of this kind. The advantages of DVFA make us optimistic about the extensions of algorithms that involve DFAs, like symbolic formal veriﬁcation and synthesis, to the setting of inﬁnite alphabets.

We demonstrate the robustness of our formalism by showing that its extension to the setting of ω-regular words is straightforward. In Section 5, we introduce and study variable B¨ chi automata (VBAs), whose pattern automata are nondeterministic B¨ chi u u automata on inﬁnite words [6]. VBAs are useful for specifying languages of inﬁnite words over inﬁnite alphabets, and in particular, speciﬁcations of systems with variables ranging over inﬁnite domains. We show that the known relation between NFAs and nondeterministic B¨ chi automata extends to a relation between VFAs and VBAs. This u enables us to easily lift the properties and decision procedures we presented for VFA to the setting of VBAs.

**2 Variable Automata over Inﬁnite Alphabets**

A nondeterministic ﬁnite automaton (NFA) is a tuple A = Γ, Q, Q0, δ, F, where Γ is a ﬁnite alphabet, Q is a ﬁnite set of states, Q0 ⊆ Q is a set of initial states, δ : Q × Γ → 2Q is a transition function, and F ⊆ Q is a set of accepting states. If there exists q such that q ∈ δ(q, a), we say that a exits q. A run of A on w = σ1 σ2... σn in Γ ∗ is a sequence of states r = r0, r1,..., rn such that r0 ∈ Q0 and for every 1 ≤ i ≤ n it holds that ri ∈ δ(ri−1, σi ). If rn ∈ F then r is accepting. Note that a run may not exist. If a run does exist, we say that w is read along A. The language of A, denoted L(A), is the set of words on which there exists an accepting run of A.

Before deﬁning variable automata with inﬁnite alphabets, let us explain the idea behind them. Consider the NFA A1 over the ﬁnite alphabet {x, y} appearing in Figure 1.

**It is easy to see that L(A1 ) = x · y ∗ · x. Consider the language L = {i1 · i2 · · · ik :**

k ≥ 2, i1 = ik, and ij = i1 for all 1 j k} over the alphabet N; that is, L contains exactly all words in which the ﬁrst letter is equal to the last letter, and is different from all other letters. Since N is inﬁnite, an NFW for it needs inﬁnitely many states and transitions. The idea behind variable automata is to label the transitions of the NFA by both letters from the inﬁnite alphabet and variables that can take values from it. For example, if we refer to x as a bounded variable whose value is ﬁxed once assigned, and refer to y as a free variable, which can take changing values, different from the value assigned to x, then the NFA A1, when viewed as a variable automaton over N, Variable Automata over Inﬁnite Alphabets 5

legal instance of v. Note that v may be the witnessing pattern for inﬁnitely many words in Σ ∗, and that a word in Σ ∗ may have several witnessing patterns (or have none).

Given a word w ∈ Σ ∗, a run of A on w is a run of A on a witnessing pattern for w.