The MaxFS problem

In the MAX FS problem, given an infeasible linear system $ A{\bf x} \geq {\bf b}$ , one wishes to find a feasible subsystem containing a maximum number of inequalities. This NP-hard problem has interesting applications in a variety of fields. In some challenging applications in telecommunications and computational biology one faces very large MAX FS instances with up to millions of inequalities in thousands of variables.

We consider the following problem, referred to as MaxFS:

Given an infeasible linear system $A{\bf x} \geq {\bf b}$ , where $A\in\mathbb{R}^{m\times n}$ and $ {\bf b}\in\mathbb{R}^{m}$ , find a feasible subsystem containing as many inequalities as possible.

This problem has a number of interesting applications in various fields such as operations research [7,12], radiation therapy [13], image and signal processing [15], statistical discriminant analysis and machine learning [4,14]. There is growing interest in MAX FS due to the fact that many complex phenomena that can be well-approximated with linear models yield formulations involving large and generally infeasible linear systems for which approximate solutions in terms of $ l_1$ – or $ l_2$ -norms are not meaningful. Note that since linear system feasibility can be checked in polynomial time, the MAX FS structure differs substantially from that of MAX SAT, the well-known problem of satisfying a maximum number of Boolean clauses. Like MAX SAT though, MAX FS can be approximated within a factor of $ 2$ but it does not admit a polynomial-time approximation scheme, unless P = NP [2]. The reader is referred to [] for a survey of work on MAX FS up to 2003.

Apart from large-scale instances arising in classification problems [4,14] and in the context of infeasible linear programs [7,12], very large MAX FS instances with up to millions of inequalities and thousands of variables occur in several challenging applications in telecommunications (digital video broadcasting [21]) and computational biology (modelling the energy function that drives the folding of proteins [16]) that are of current interest. Such instances are beyond the reach of available computational approaches, which include state-of-the-art MIP solvers applied to big-M formulations, the best available heuristics [7], as well as the latest polyhedral techniques based on partial set covering formulations [3,19] and combinatorial Benders’ cuts [8].

To tackle large-scale instances of MAX FS, we propose to use randomized and thermal variants of the classical Agmon-Motzkin-Schoenberg (AMS) relaxation method for solving systems of linear inequalities. Deterministic AMS relaxation versions have been extensively studied in the mathematical programming literature, often as special cases of subgradient methods (see e.g. [11,22]), and in the machine learning literature under the name of perceptron procedures (see e.g. [5,17]). The algorithms we propose and investigate here extend the thermal perceptron heuristic [10]. It is worth pointing out that the power of randomization in relaxation/subgradient type methods, which was first exploited by the machine learning community, has also been recently studied in the context of feasible systems of convex inequalities, see [20], and in the context of the minimization of sums of convex component functions with a large number of summands, see [18].