# «Discrete and Lexicographic Helly Theorems and their Relations to LP-Type Problems Thesis submitted for the degree of “Doctor of Philosophy” by ...»

“basis” tree BT consists of the at most p recovery points. For each vertex v we allocate a pointer to its parent and a data structure (e.g., a tree) of pointers to its siblings. This data structure supports the operations “Addsibling(x, y)” (add y as a sibling of x) and “Removesibling(x, y)” (remove y from being a sibling of x). We also allocate two variables dr and d. dr contains the distance from v to the root and d contains the distance from its exclusive descendant in H to v (= dH (edH (v), v)). In this way d is the maximal distance a vertex covered by v should roll back.

“all nodes” tree AN T consists of all the vertices of VG ∩ N L. We keep for each vertex v in AN T the same information as described in tree BT.

LIFO stack S is a data structure supporting two types of operations: insertion of an element to the top of the stack (“Push”), and retrieval of the topmost element from the stack (“Pop”). The stack S contains the changes made to AN T throughout the execution of the algorithm. It is updated by the following

**two procedures:**

Procedure Delete below deletes vertex h (which is not the root) from the tree AN T by contracting the edge (h, h.parent) and assigning the siblings of h to be siblings of h.parent. These operations are recorded in the stack S.

Procedure Delete(AN T, h, S) begin Push(“*”,S) for every sibling i of h do Push(“h.sibling[i].parent =”h,S) let h.sibling[i].parent = h.parent Push(“Addsibling”(h,h.sibling[i]),S) Push(“Removesibling”(h.parent,h.sibling[i]),S) Addsibling(h.parent, h.sibling[i]) end for Push(“h.parent.d =”h.parent.d,S) let h.parent.d = max{h.parent.d, h.d + (h.dr − h.parent.dr)} Push(“h.parent.#siblings =”h.parent.#siblings,S) Push(“Addsibling”(h.parent, h),S) Removesibling(h.parent, h) end Delete We note that Procedure Delete above runs in a time linearly proportional to the number of siblings h has in the tree AN T. Since the degree of AN T is not necessarily bounded by a constant throughout the execution of Procedure Main Recovery (given below), we will bound the total number of changes made in the dynamic tree ANT during the execution of the LP-type algorithm of [SW92]. This algorithm makes an expected O(n) number of random choices of h. Hence it is suﬃcient to evaluate the number of changes in the dynamic tree ANT (which initially has |N L| vertices) made by |N L| successive calls to Procedure Delete(AN T, h, S) where the sequence of h s is generated by a single permutation on N L.

The bound on the number of changes is as follows (due to Har Peled [Har98]). Let π be a random permutation of N L \ {h1 }. For any two vertices u, v ∈ N L \ {h1 }, such that u is an ancestor of v, we deﬁne the random variable 1u,v to be 1 if u and v appear in π after any vertex from the path between u and v, and 0 otherwise. We get that 1u,v = 1 if and only if the edge (u, v) appears in the dynamic tree throughout the choices made by π. Let l(u, v) denote the number of edges in the path between u and v in H. We note that E[1u,v ] = l(u,v)(l(u,v)+1).

For all v ∈ N L let anc(v) ⊂ N L be the set of ancestors of v in N L. We note that the number of changes in the dynamic tree is exactly

We note that the overall number of operations made by Procedure Add during the execution of Procedure Main Recovery below, depends linearly on the number of changes made in the stack S throughout the algorithm. This number depends linearly on the time Procedure Delete runs, which is O(n) due to the discussion above.

We implement the algorithm of [SW92] such that Function Main Recovery (Recovery) below corresponds to Function Main 0 (lptype) of [SW92] (see Section 6.2), respectively. Function Main Recovery initiates the data structures and returns the basis of the solution for the p-Recovery Points problem on the tree H.

Function Main Recovery(H, p) begin create an “all nodes” tree, AN T, for H preprocess AN T for the nca (Nearest Common Ancestor) queries create a “basis” tree, BT, for an arbitrary p-membered set RP containing h1 return Recovery(N L, RP, ∅, AN T, BT, ∅) end Main Recovery We implement the ﬁrst line as follows. We ﬁrst make a copy of H and call it AN T. For each node v we allocate two variables dr and d, a pointer to its parent, and a data structure of pointers to its siblings, as described in the deﬁnition of the data structures. We set the values of the variables dr and d for all the nodes by performing a single DFS on H. Last, for every leaf node v in H we perform a call to Delete(AN T, v, S) where S is an empty stack. Clearly all these operations can be done in linear time.

Regarding the time needed for preprocessing the nca queries (the second line), we note that Harel and Tarjan [HT84] were the ﬁrst to give an algorithm which preprocesses a tree in linear time and subsequently answers nca queries in constant time. Chazelle [Chaz87] gave another algorithm while Schieber and Vishkin [SV88] gave a simpler algorithm. Both their algorithms have the same complexity. Cole and Hariharan [CH99] gave a more complex algorithm which supports more operations, and which also performs nca queries in constant time.

The third line creates a “basis” tree BT for an arbitrary set RP = {h1, hb2,..., hbp } of nodes by performing the following steps. We ﬁrst let BT be a copy of AN T. We then perform a DFS on BT, and for every vertex h ∈ RP encountered during the DFS, we perform a call to Procedure Delete(BT, h, S), where S is an / empty stack. In the end of this process, BT is as described in the deﬁnition of the data structures. Clearly all these operations can be done in linear time.

Concerning the state variables, we calculate r = rbtp (BT ), by performing a DFS on BT to ﬁnd the maximal distance between points in RP and their farthest exclusive descendant. We then ﬁnd the vertices hjh, hjl, by performing another DFS on BT. Last, we ﬁnd the state variables by executing Main Distribution(BT, hjh, hjl, r, p). Since p is a constant independent of n, all these steps are executed in constant time.

The input for Function Recovery below consists of a set of non-leaf nodes N L, a subset C ⊆ N L, and a one-element subset D ⊂ N L (such that basis(C ∪ D) serves as a candidate basis. If D = ∅ then C itself is a candidate basis). The input consists also of the current “all nodes” tree AN T, the current “basis” tree BT, and a LIFO stack of operations S. The function returns the basis of the solution for the p-Recovery Points problem on the tree AN T.

Since the algorithm of [SW92] calls the violation test on the average O(n) times, to show that Function Recovery runs in average linear time, it is suﬃcient to show how to implement Function Violation and Function Basis in constant time.

The violation test can be done by checking whether the conditions of Lemmas 8.2.4 and 8.2.5 are fulﬁlled by h. We give below a speciﬁc implementation of this function.

Function Violation(BT, h) begin ﬁnd the end points of the big path, hjh and hjl, by performing a BF S on BT ﬁnd the position of h in BT (i.e., ﬁnd the nodes in BT which are its nearest ancestor and nearest descendant) by performing a BF S on BT and querying nca(h, v) for every node v in BT let na be the nearest ancestor of h in BT let #siblings be the number of siblings of na in BT, which are also descendants of h let s be an array of length #siblings, of the siblings of na, which are also descendants of h if there are free recovery points then if h is in the big path (i.e., na = hjh and hjl is one of the members of the array s) then return TRUE elseif h belongs to an improving zone then return TRUE else return FALSE elseif h belongs to an improving zone then return TRUE else return FALSE end if end Violation We note that, when given BT, the complexity of a single call to Function Violation is O(p). This is true since there are O(p) vertices in BT, the nca query is called at most O(p) times, and each nca query (on the original tree H) is done in constant time.

The basis computation can be done by trying all possible combinations to choose p recovery points from the set B ∪ {h} of p + 1 points. We give below a speciﬁc implementation of this function. The input for Function Basis is as follows. B is a set of vertices, h ∈ N L \ B, AN T is the “all nodes” tree of B ∪ {h}, and BT is the “basis” tree of B.

Since the cardinality of B is a constant independent on n, there is only a constant number of choices of C, and the for loop is executed that amount of times. The complexity of a single iteration in the loop is O(p).

Hence the complexity of a single basis calculation is constant.

The analysis of the various procedures and functions above implies that tv and tb are constants and the total eﬀort spent on “housekeeping” is M = O(n). Hence the total running time of the algorithm of [SW92], O(n(tv + tb ) + M ) is linear. We have shown Theorem 8.2.8 The p-Recovery Points problem on trees can be solved in randomized linear time.

8.3 Placing recovery points on the real line We deﬁne the p-Recovery Points problem (p ≥ 3) on the real line as follows. Let H = {h1, h2,..., hn } (n ≥ p) be a set of n points in I For x, y ∈ H we deﬁne d(x, y) = |x − y|. Without loss of generality we assume R.

h1 = minVH and h2 = maxVH. Let i1 = 1 and let ip = 2. In the p-Recovery Points problem on the real line

**we are interested in choosing p − 2 points from H in order to get:**

Considering the MMFD problem, Ravi, Rosenkrantz and Tayi [RRT94] cite the O(max(pn, n log n)) algorithm in Wang and Kuo [WK88], as the algorithm with the lowest complexity bound. Tamir [Ta98] proposes an algorithm of O(n log n), using a duality transformation to the (continuous) (p − 1)-center problem and the algorithms in Frederickson and Johnson [FJ83] and Megiddo, Tamir, Zemel and Chandrasekaran [MTZC81].

Assuming p is ﬁxed (and nothing about the order of the points in H), we can get a randomized O(n) time algorithm for MMFD. We ﬁrst solve the corresponding (p − 1)-center problem by solving its lexicographic version in O(n) randomized time. (The solution for the lexicographic center problem is also a solution for the non-lexicographic problem). By Theorem 2.1.2(ii) (Theorem 2.3.4(ii)), the Helly number (lexicographic Helly number) of the Helly theorem (lexicographic Helly theorem) corresponding to the (p − 1)-center problem (lexicographic (p − 1)-center) problem on the line is p (p), respectively. Hence by Theorem 5.3.6 the lexicographic (p − 1)-center problem on the line is a ﬁxed-dimensional LP-type problem with combinatorial dimension of at most 2p. Using an LP-type algorithm such as the one of [SW92], we can solve this problem in randomized linear time and get its optimal value. Then, via the duality to MMFD, we compute the optimal objective value of the MMFD and, with this value on hand, the optimal solution for MMFD can be found in additional O(n) time.

Looking at the Recovery Points problem, the algorithm of Chung and Krishnamoorthy [CK88], which is originally for trees, can be applied here to give an O(nN ) time solution where N is the size (in bits) of the input. Alternately, the algorithm runs in O(n log(maxH − minH )) time. The main drawback of this algorithm is that it works only when the edge lengths are rationals, and its time complexity also depends on the size of the given integers. Thus this algorithm is polynomial but not strongly polynomial. Megiddo [Me83c] observes that after sorting the points (in O(n log n) time) one can derive a solution in additional O(n log n) time by using ideas of Frederickson and Johnson [FJ83].

Assuming the points are sorted a-priori one can get, using Megiddo and Tamir’s [MT81] algorithm for the (p − 1)-center problem, an O(p2 log2 n) time algorithm for both MMFD and the Recovery Points problems.

Olstad and Manne [OM95] give an O(p(n − p)) time algorithm, which runs in linear time for either a ﬁxed p or a ﬁxed n − p. Chrobak, Eppstein, Italiano and Yung [CEIY91] give an O(nβ(n, p)) time algorithm where β(n, p) = min{i | log(i) n ≤ n/p}. In particular, if p is ﬁxed, their algorithm runs in linear time. When p is close to n their algorithm runs in O(n log∗ n). Ultimately Frederickson [Fr91] gives a linear time algorithm, regardless of the value of p. Overall, the Recovery Points problem for a given set of n points which are sorted a-priori, can be solved in O(min{n, p2 log2 n}) time.

Assuming p is ﬁxed (and nothing about the order of the points in H) we give here an optimal O(n) randomized algorithm.

On the other hand, one should observe that if the points are not sorted, and p is not ﬁxed, ﬁnding the objective value itself will need Ω(n log n) time under the algebraic tree model: We reduce the Max-Gap problem to the Recovery Points problem. In the Max-Gap problem we are given a ﬁnite set H of real numbers, and want to compute the maximum gap between 2 successive numbers in the sorted sequence.

It is well known ([LW86]) that the Max-Gap problem has an Ω(n log n) lower bound under the algebraic computation tree model. Given an instance H of the Max-Gap problem, we set p = n, place a recovery point at each point in H, and the roll back time of the Recovery Points problem is the maximum gap between consecutive points.

8.3.1 A formulation as a p-dimensional dual LP-type problem Let VH = {h1, h2,..., hn } denote a set of n points in I (n ≥ p). We are interested in ﬁnding the p-recovery R points realizing (8.7).

For the sake of presentation we consider a slightly diﬀerent problem - Placing p-Recovery Points on path trees. Let H be the directed path tree embedding the set of reals VH (see deﬁnition in the beginning of Section 8.2), and let EV be the set of its directed edges (so H = (VH, VE )). In this way, h1 is the root and h2 is the only leaf. We decompose VH = L N L, where L = {h2 } is the set of leaves of the tree and N L is the set of the remaining vertices. We note that any induced subtree G of H contains h1, h2 in GV. We note that unlike in the (general) tree case, the intersection between a big path and a small path in a path tree is either a vertex or an empty set.

Let G = VG, EG be an induced subtree of H. Remembering that a leaf does not count as a recovery point when placing recovery points on trees, we get

** rbtp (VG ) = rbtp−1 (G), (8.9)**