The relaxation method is a simple algorithmic framework for solving systems of linear inequalities, see e.g. [11,22]. At every iteration the attention is focused on reducing the violation of a single inequality. Given a feasible system $ A{\bf x} \geq {\bf b}$ with $ A \in \mathbb{R}^{m \times n}$ and $ {\bf b} \in \mathbb{R}^m$, as well as a starting point $ {\bf x}_0\in\mathbb{R}^n$, such a method generates a sequence $ \bigl({\bf x}_i\bigr)_\mathbb{N}\subset\mathbb{R}^n$ of approximate solutions as follows. In iteration $ i$ an index $ k_i\in\{1,\dots,m\}$ is selected according to some specified rule, the corresponding inequality $ {\bf a}_{k_i}{\bf x}\geq b_{k_i}$ is considered, and the next iterate is determined via
${\bf x}_{i+1} = \left\{\begin{array}{ll} {\bf x}_i + \eta_i \{… …f x\}_i < b_{k_i}\} \\ {\bf x}_i & \; \; \mbox{otherwise}, \end{array} \right.$ (1)
where $ \eta_i>0$ and $ p_i\in[0,1]$ may be chosen differently in every iteration. When ${\bf x}_{i+1}\neq {\bf x}_i$ we say that an update has occurred. Any specific variant of this approach can be characterized by
- a selection rule which specifies the way a single inequality is selected at each iteration,
- a step-length multiplier rule which specifies how the sequence of $ \eta_i$ is determined,
- and a decision rule characterized by the updating probability $ p_i$ .
Let us now briefly comment on some specific choices of these rules.
- In most classical versions of the relaxation method the inequalities are considered in cyclic order or according to the largest violation. In the randomized relaxation method, we assume that $ k_i$ is chosen uniformly at random from the set $ \{1,\dots,m\}$ . This introduces a first level of randomization.
- Classical choices for the step length multiplier $ \eta_i$ include $ \eta_i\equiv 1$ in the perceptron method and $ \eta_{i}=(b_{k_i}-{\bf a}_{k_i}{\bf x}_i)/\Vert{\bf a}_{k_i}\Vert$ in the cyclic projection method. In the algorithms we consider in this paper, we use $ \eta_i>0$ as a tool for prioritizing the level of attention paid to attempting to satisfy the different inequalities depending on their varying degrees of violation.
- The updating probability $ p_i$ with which an update is carried out introduces a second layer of randomization. When $ p_i<1$, we therefore speak of a doubly randomized method, while $ p_i\equiv 1$ corresponds to the standard (singly) randomized relaxation method. The use of double randomization is interesting because it allows replacing a large number of small updates by updates of moderate size that occur with small probability.
To adapt the relaxation method to the MAX FS framework and infeasible systems $ A{\bf x} \geq {\bf b}$, we make $ \eta_i$ and $ p_i$ dependent on a parameter $ t_i$ that keeps changing over time. This leads to the so-called thermal variants of the randomized relaxation method, henceforth referred to as RTR. The motivation is that the choice of $ \eta_i$ should initially allow large corrections to the iterates $ {\bf x}_i$, in order for the algorithm to make good progress. However, once the iterates have moved to a region of the solution space containing good approximate solutions of MAX FS, $ {\bf x}_i$ should be further improved only by small local updates, as large updates risk destroying much of the progress previously achieved. Therefore, we want to force the method to favour updates generated by inequalities with progressively smaller violation $ v_i=\max\bigl\{0,b_{k_i}-{\bf a}_{k_i} {\bf x}_i\bigr\}$ . A natural way to achieve this is by introducing a decreasing temperature schedule $ (t_i)_{\mathbb{N}}\subset\mathbb{R}_+$ and to choose $(\eta_i)_{\mathbb{N}}$ as a function of $ t_i$ and $ v_i$, for example, by setting
$\displaystyle \eta_i=\frac{t_i}{t_0} \exp (- v_i/t_i).$ (2)
For large values of $ t_i$, any violated inequality yields a significant update, while for small $ t_i$ only inequalities with small violations $ v_i$ yield significant updates. In the doubly randomized framework one trades small step-length multipliers $ \eta_i$ for small updating probabilities $ p_i$ . For example, inthe RTR variant analyzed in Section 3, we choose $ \eta_i=1$ for all $ i$ and
$\displaystyle p_i=\frac{t_i}{t_0} \exp (- v_i/t_i).$ (3)
Like in simulated annealing (SA), $ t_i$ can be interpreted as a temperature that gradually “cools down” the activity level of the algorithm and forces it to pursue more localized searches. It is however important to point out that the similarity does not extend beyond this intuitive level. For example, in doubly randomized RTR methods the updating probability depends only on the violation $ v_i$ of a single inequality, while in SA it would depend on the objective function value (i.e., the number of satisfied inequalities) of the new iterate $ {\bf x}_{i+1}$ . The two approaches are thus significantly different. Our analysis of Section 3 is based on a technique that does not occur in the SA literature and our probabilistic finite termination results are stronger than SA asymptotic convergence guarantees.
We conclude with two technical remarks:
- To complete the description of our RTR methods, we need to specify the termination criterion. Since these methods do not ever properly terminate for infeasible systems, the best iterate found up to iteration $ i$ needs to be stored in a so-called ratchet vector $ {\bf z}_i$ . In practical implementations, a maximum number of iterations, maxit, is selected at the outset and the ratchet vector at iteration maxit is returned as an approximate solution of MAX FS.
- The inherent parallelism of the RTR methods can be exploited to tackle very large MaxFS instances such as those considered in Section 4. See also [6] for parallel versions of relaxation methods for linear (convex) feasible systems.