Abstract

Fitting geometric shapes to observed data points (images) is a popular task in computer vision and modern statistics (errors-in-variables regression). We investigate the problem of existence of the best fit using geometric and probabilistic approaches.

1. Introduction

In many areas of human practice, one needs to approximate a set of points 𝑃1,…,π‘ƒπ‘›βˆˆβ„π‘‘ (representing experimental data or observations) by a simple geometric shape, such as a line, plane, circular arc, elliptic arc, and spherical surface. This problem is known as fitting a model object (line, circle, sphere, etc.) to observed data points.

The best fit is achieved when the geometric distances from the given points to the model object are minimized, in the least squares sense. Finding the best fit reduces to the minimization of the objective function β„±(𝑆)=𝑛𝑖=1𝑃dist𝑖,𝑆2,(1.1) where π‘†βŠ‚β„π‘‘ denotes the model object (line, circle, sphere, etc.).

While other fitting criteria are used as well, it is the minimum of (1.1) that is universally recognized as the most desirable solution of the fitting problem. It has been adopted as a standard in industrial applications [1, 2]. In statistics, the minimum of (1.1) corresponds to the maximum likelihood estimate under usual assumptions known as functional model; see [3, 4]. Minimization of (1.1) is also called orthogonal distance regression (ODR).

Geometric fitting in the sense of minimizing (1.1) has a long history. First publications on fitting linear objects (lines, planes, etc.) to given points in 2D and 3D date back to the late nineteenth century [5]. The problem of fitting lines and planes was solved analytically in the 1870s [6], and the statistical properties of the resulting estimates were studies throughout the twenties century [7–9], the most important discoveries being made in the 1970s [3, 10, 11]. These studies gave rise to a new branch of mathematical statistics now known as errors-in-variable (EIV) regression analysis [3, 4].

Since the 1950s, fitting circles and spheres to data points became popular in archaeology, industry, and so forth, [12, 13]. In the 1970s, researchers started fitting ellipses and hyperbolas to data points [14–17].

The interest to the problem of geometric fitting surged in the 1990s when it became an agenda issue for the rapidly growing computer science community. Indeed, fitting simple contours to digitized images is one of the basic tasks in pattern recognition and computer vision. In most cases, those contours are lines, circles, ellipses and other conic sections (called simply conics, for brevity) in 2D, and planes, spheres, ellipsoids in 3D. More complicated curves and surfaces are used occasionally too [1, 18], but it is more common to divide complex images into smaller segments and approximate each of those by a line or by a circular arc. This way one can approximate a complex contour by a polygonal line or a circular spline (see, e.g., [19, 20]).

Most publications on the geometric fitting problem are devoted to analytic solutions (which are only possible for lines and planes), or practical algorithms for finding the best-fitting object, or statistical properties of the resulting estimates. Very rarely one addresses fundamental issues such as the existence and uniqueness of the best fit. If these issues do come up, one either assumes that the best fit exists and is unique, or just points out examples to the contrary without deep investigation.

In this paper we address the issue of existence of the best fit in a general and comprehensive manner. The issue of uniqueness will be treated in a separate paper. These issues turn out to be quite nontrivial and lead to unexpected conclusions. As a glimpse of our results, we provide Table 1 summarizing the state of affairs in the problem of fitting 2D objects (here Yes means the best-fitting object exists or is unique in all respective cases; No means the existence/uniqueness fails in some of the respective cases).

We see that the existence and uniqueness of the best-fitting object cannot be just taken for granted. Actually 2/3 of the answers in Table 1 are negative. The uniqueness can never be guaranteed. The existence is guaranteed only for lines. In typical cases (i.e., for typical sets of data points, in probabilistic sense; see precise definition in Section 6), the best-fitting line and circle do exist, but the best-fitting ellipse does not.

The existence and uniqueness of the best fit are not only of theoretical interest but also practically relevant. For example, knowing under what conditions the problem does not have a solution might help us understand why the computer algorithm keeps diverging, or returns nonsense, or crashes altogether. While the cases where the best fit does not theoretically exist may be exceptional, nearby cases may be practically hard to handle, as the best-fitting object may be extremely difficult to find.

The nonuniqueness has its practical implications too. It means that the best-fitting object may not be stable under slight perturbations of the data points. An example is described by Nievergelt [21]; he presented a set of 𝑛=4 points that can be fitted by three different circles equally well. Then by arbitrarily small changes in the coordinates of the points one can make any of these three circles fit the points a bit better than the other two circles, thus the best-fitting circle will change abruptly. See also a similar example in [4, Section 2.2].

Here we develop a general approach to the studies of existence of the best fit. Our motivation primarily comes from popular applications where one fits lines, circles, ellipses, and other conics, but our methods and ideas can be applied to much more general models. Our approach works in any dimension; that is, we can treat data points 𝑃1,…,π‘ƒπ‘›βˆˆβ„π‘‘ and model objects π‘†βŠ‚β„π‘‘ for any 𝑑β‰₯2, but for the sake of notational simplicity and ease of illustrations we mostly restrict the exposition to the planar case 𝑑=2.

2. W-Convergence and the Induced Topology

A crucial notion in our analysis is that of convergence (of model objects).

Motivating Example
Consider a sequence of lines 𝐿𝑛={𝑦=π‘₯/𝑛}. They all pass through the origin (0,0) and their slopes 1/𝑛 decrease as 𝑛 grows (Figure 1). Naturally, we would consider this sequence as convergent; it converges to the π‘₯-axis, that is, to the line 𝐿={𝑦=0}, as π‘›β†’βˆž.

However, it is not easy to define convergence so that the above sequence of lines would satisfy it. For example, we may try to use Hausdorff distance to measure how close two objects 𝑆1and𝑆2βŠ‚β„2 are. It is defined by distH𝑆1,𝑆2ξ€Έξƒ―=maxsup𝑃1βˆˆπ‘†1𝑃dist1,𝑆2ξ€Έ,sup𝑃2βˆˆπ‘†2𝑃dist2,𝑆1ξ€Έξƒ°.(2.1) If the two sets 𝑆1and𝑆2 are closed and the Hausdorff distance between them is zero, that is, distH(𝑆1,𝑆2)=0, then they coincide 𝑆1=𝑆2. If the Hausdorff distance is small, the two sets 𝑆1and𝑆2 nearly coincide with each other. So the Hausdorff distance seems appropriate for our purposes.

But it turns out that the Hausdorff distance between the lines 𝐿𝑛 and 𝐿 in our example is infinite, that is, distH(𝐿𝑛,𝐿)=∞ for every 𝑛. Thus the Hausdorff distance cannot be used to characterize convergence of lines as we would like to see it.

Windows
So why do we think that the line 𝐿𝑛 converges to the line 𝐿, despite an infinite Hausdorff distance between them? It is because we do not really see an infinite line, we only β€œsee’’ objects in a certain finite area, like in Figure 1. Suppose we see objects in some rectangle 𝑅={βˆ’π΄β‰€π‘₯≀𝐴,βˆ’π΅β‰€π‘¦β‰€π΅},(2.2) which for the moment will play the role of our β€œwindow’’ through which we look at the plane. Then we see segments of our lines within 𝑅; that is, we see intersections πΏπ‘›βˆ©π‘… and πΏβˆ©π‘…. Now clearly the segment πΏπ‘›βˆ©π‘… gets closer to πΏβˆ©π‘…, as 𝑛 grows, and in the limit π‘›β†’βˆž they become identical. This is why we see the lines 𝐿𝑛 converging to 𝐿. We see this convergence no matter how large the window 𝑅 is. Note that the Hausdorff distance between πΏπ‘›βˆ©π‘… and πΏβˆ©π‘… indeed converges to zero: distH(πΏπ‘›βˆ©π‘…,πΏβˆ©π‘…)β†’0 as π‘›β†’βˆž.

Window-Restricted Hausdorff Distance
Taking cue from the above example, we define the Hausdorff distance between sets 𝑆1 and 𝑆2  within a finite window 𝑅 as follows: distH𝑆1,𝑆2ξ€Έξƒ―;𝑅=maxsupπ‘ƒβˆˆπ‘†1βˆ©π‘…ξ€·dist𝑃,𝑆2ξ€Έ,supπ‘„βˆˆπ‘†2βˆ©π‘…ξ€·dist𝑄,𝑆1ξ€Έξƒ°.(2.3) The formula (2.3) applies whenever both sets, 𝑆1 and 𝑆2, intersect the window 𝑅. If only one set, say 𝑆1, intersects 𝑅, we modify (2.3) as follows: distH𝑆1,𝑆2ξ€Έ;𝑅=supπ‘ƒβˆˆπ‘†1βˆ©π‘…ξ€·dist𝑃,𝑆2ξ€Έ.(2.4) A similar modification is used if only 𝑆2 intersects 𝑅. If neither set intersects the window 𝑅, we simply set distH(𝑆1,𝑆2;𝑅)=0 (because we β€œsee’’ two empty sets, which are not distinguishable).
We now define our notion of convergence for a sequence of sets.

Definition 2.1. Let π‘†π‘›βŠ‚β„2 be some sets and π‘†βŠ‚β„2 another set. We say that the sequence 𝑆𝑛  converges to 𝑆 if for any finite window 𝑅 we have distH𝑆𝑛,𝑆;𝑅→0asπ‘›β†’βˆž.(2.5)

In this definition the use of finite windows is crucial; hence we will call the resulting notion Window convergence, or W convergence, for short. According to this definition, the sequence of lines 𝐿𝑛 in the above example indeed converges to the limit line 𝐿.

For finite-size (bounded) objects, like circles or ellipses, the W-convergence is equivalent to the convergence with respect to the Hausdorff distance. However for unbounded objects, such as lines and hyperbolas, we have to use our window-restricted Hausdorff distance and formula (2.5).

Our definition of convergence (i.e., W-convergence) is intuitively clear, but some complications may arise if it is used too widely. For example, if the limit object 𝑆 is not necessarily closed, then the same sequence {𝑆𝑛} may have more than one limit 𝑆 (though the closure 𝑆 of every limit set 𝑆 will be the same). Thus, from now on, to avoid pathological situations, we will assume that all our sets π‘†βŠ‚β„2 are closed. All the standard model objectsβ€”lines, circles, ellipses, hyperbolasβ€”are closed.

Convergence induces topology in the collection of all closed sets π‘†βŠ‚β„2, which we denote by 𝕏. A set π‘ŒβŠ‚π• is closed if for any sequence of sets π‘†π‘›βˆˆπ‘Œ converging to a limit set 𝑆 the limit set also belongs to π‘Œ, that is, π‘†βˆˆπ‘Œ. A set π‘ˆβŠ‚π• is open if its complement π•β§΅π‘ˆ is closed.

The above topology on 𝕏 is metrizable. This means we can define a metric on 𝕏, that is, a distance between closed sets in ℝ2, in such a way that the W-convergence 𝑆𝑛→𝑆 means exactly that the distance between 𝑆𝑛 and 𝑆 goes down to zero. Such a distance can be defined in many ways; here is one of them.

Definition 2.2. The W distance (or Window distance) between two closed sets 𝑆1and𝑆2βŠ‚β„2 is defined as follows: distW𝑆1,𝑆2ξ€Έ=βˆžξ“π‘˜=12βˆ’π‘˜distH𝑆1,𝑆2;π‘…π‘˜ξ€Έ,(2.6) where π‘…π‘˜ is a square window of size 2π‘˜Γ—2π‘˜, that is, π‘…π‘˜={|π‘₯|β‰€π‘˜,|𝑦|β‰€π‘˜}.

In this formula, we use a growing sequence of nested windows and the Hausdorff distances between 𝑆1 and 𝑆2 within those windows balanced by the factors 2βˆ’π‘˜. In the formula (2.6), the first nonzero term corresponds to the smallest window π‘…π‘˜ that intersects at least one of the two sets, 𝑆1 or 𝑆2. The sum in (2.6) is always finite. Indeed, let us suppose, for simplicity, that both 𝑆1 and 𝑆2 intersect each window π‘…π‘˜. Then (since the distance between any two points in π‘…π‘˜ is at most 2√2π‘˜) the above sum is bounded by 2√2βˆ‘βˆžπ‘˜=1π‘˜2βˆ’π‘˜<6.

3. Continuity of Objective Function

Recall finding the best-fitting object 𝑆 for a given set of points 𝑃1,…,π‘ƒπ‘›βˆˆβ„2 consists of minimization of the objective function (1.1).

Let 𝕄 denote the collection of model objects. We put no restrictions on 𝕄 other than all π‘†βˆˆπ•„ are assumed to be closed sets. The collection 𝕄 is then a subset of the space 𝕏 of all closed subsets π‘†βŠ‚β„2. The topology and metric on 𝕏 defined above now induce a topology and metric on 𝕄.

Redundancy Principle
If an object π‘†β€²βˆˆπ•„ is a subset of another object π‘†βˆˆπ•„, that is, π‘†ξ…žβŠ‚π‘†, then for any point 𝑃 we have dist(𝑃,𝑆)≀dist(𝑃,π‘†ξ…ž); thus π‘†ξ…ž cannot fit any set of data points better than 𝑆 does. So for the purpose of minimizing β„±, that is, finding the best-fitting object, we may ignore all model objects that are proper subsets of other model objects (they are redundant). This may reduce the collection 𝕄 somewhat. Such a reduction is not necessary, it is just a matter of convenience, and we will occasionally apply it below.
Conversely, there is no harm in considering any subset π‘†ξ…žβŠ‚π‘† of an object π‘†βˆˆπ•„ as a (smaller) object, too. Indeed, if π‘†ξ…ž provides a best fit (i.e., minimizes the objective function β„±), then so does 𝑆, because β„±(𝑆)≀ℱ(π‘†ξ…ž). Hence including π‘†ξ…ž into the collection 𝕄 will not really be an extension of 𝕄, its inclusion will not change the best fit.

Theorem 3.1 (continuity of β„±). For any given points 𝑃1,…,𝑃𝑛 and any collection 𝕄 of model objects the function β„± defined by (1.1) that is continuous on 𝕄. This means that if a sequence of objects π‘†π‘šβˆˆπ•„ converges (i.e., W converges) to another object π‘†βˆˆπ•„, then β„±(π‘†π‘š)β†’β„±(𝑆).

Proof. Since β„±(𝑆) is the sum of squares of distances dist(𝑃𝑖,𝑆) to individual points 𝑃𝑖, see (1.1), it is enough to verify that the distance dist(𝑃,𝑆) is a continuous function of 𝑆 for any given point 𝑃.
Suppose π‘ƒβˆˆβ„2 and a sequence of closed sets π‘†π‘š W converging to a closed set 𝑆. We denote by π‘„βˆˆπ‘† the point in 𝑆 closest to 𝑃, that is, such that dist(𝑃,𝑄)=dist(𝑃,𝑆); such a point 𝑄 exists because 𝑆 is closed. Denote by 𝐷 the disk centered on 𝑃 of radius 1+dist(𝑃,𝑄); it contains 𝑄. Let 𝑅 be a window containing the disk 𝐷.
Since 𝑅 contains 𝑄, it intersects with 𝑆, that is, π‘…βˆ©π‘†β‰ βˆ…. This guarantees that distH(π‘†π‘š,𝑆;𝑅)β†’0. Thus, there are points π‘„π‘šβˆˆπ‘†π‘š such that π‘„π‘šβ†’π‘„. Since dist(𝑃,π‘†π‘š)≀dist(𝑃,π‘„π‘š), we conclude that the upper limit of the sequence dist(𝑃,π‘†π‘š) does not exceed dist(𝑃,𝑆), that is, ξ€·limsupdist𝑃,π‘†π‘šξ€Έβ‰€dist(𝑃,𝑆).(3.1) On the other hand, we will show that the lower limit of the sequence dist(𝑃,π‘†π‘š) cannot be smaller than dist(𝑃,𝑆), that is, ξ€·liminfdist𝑃,π‘†π‘šξ€Έβ‰₯dist(𝑃,𝑆).(3.2) The estimates (3.1) and (3.2) together imply that dist(𝑃,π‘†π‘š)β†’dist(𝑃,𝑆), as desired, and hence the distance function dist(𝑃,𝑆) will be continuous on 𝕄. It remains to prove (3.2).
To prove (3.2) we assume that it is false. Then there is a subsequence π‘†π‘šπ‘˜ in our sequence of sets π‘†π‘š such that limπ‘˜β†’βˆžξ€·dist𝑃,π‘†π‘šπ‘˜ξ€Έξ€·=liminfdist𝑃,π‘†π‘šξ€Έ<dist(𝑃,𝑆).(3.3) Denote by π‘„π‘šβˆˆπ‘†π‘š the point in π‘†π‘š closest to 𝑃, that is, such that dist(𝑃,π‘„π‘š)=dist(𝑃,π‘†π‘š). Then we have limπ‘˜β†’βˆžξ€·dist𝑃,π‘„π‘šπ‘˜ξ€Έ=limπ‘˜β†’βˆžξ€·dist𝑃,π‘†π‘šπ‘˜ξ€Έ<dist(𝑃,𝑆)=dist(𝑃,𝑄).(3.4) Since the points π‘„π‘šπ‘˜ are closer to 𝑃 than the point 𝑄 is, we have π‘„π‘šπ‘˜βˆˆπ·βŠ‚π‘…. Recall that distH(π‘†π‘š,𝑆;𝑅)β†’0, hence 𝑄distπ‘šπ‘˜ξ€Έ,𝑆→0asπ‘˜β†’βˆž.(3.5) Denote by π»π‘šπ‘˜βˆˆπ‘† the point in 𝑆 closest to π‘„π‘šπ‘˜, that is, such that dist(π‘„π‘šπ‘˜,π»π‘šπ‘˜)=dist(π‘„π‘šπ‘˜,𝑆). Now we have, by triangle inequality, ξ€·dist(𝑃,𝑆)≀dist𝑃,π»π‘šπ‘˜ξ€Έξ€·β‰€dist𝑃,π‘„π‘šπ‘˜ξ€Έξ€·π‘„+distπ‘šπ‘˜,π»π‘šπ‘˜ξ€Έξ€·=dist𝑃,π‘„π‘šπ‘˜ξ€Έξ€·π‘„+distπ‘šπ‘˜ξ€Έ.,𝑆(3.6) The limit of the first term on the right hand side of (3.6) is <dist(𝑃,𝑆) by (3.4), and the limit of the second term is zero by (3.5). This implies dist(𝑃,𝑆)<dist(𝑃,𝑆), which is absurd. The contradiction proves (3.2). And the proof of (3.2) completes the proof of the theorem.

4. Existence of the Best Fit

The best-fitting object 𝑆bestβˆˆπ•„ corresponds to the (global) minimum of the objective function β„±, that is, 𝑆best=argminπ‘†βˆˆπ•„β„±(𝑆). The function β„± defined by (1.1) cannot be negative; thus, it always has an infimum β„±0=infπ‘†βˆˆπ•„β„±(𝑆).(4.1) This means that one cannot find objects π‘†βˆˆπ•„ such that β„±(𝑆)<β„±0, but one can find objects π‘†βˆˆπ•„ such that β„±(𝑆) is arbitrarily close to β„±0. More precisely, there is a sequence of objects π‘†π‘š such that β„±(π‘†π‘š)>β„±0 for each π‘š and β„±(π‘†π‘š)β†’β„±0 as π‘šβ†’βˆž.

In practical terms, one usually runs a computer algorithm that executes a certain iterative procedure. It produces a sequence of objects π‘†π‘š (here π‘š denotes the iteration number) such that β„±(π‘†π‘š)<β„±(π‘†π‘šβˆ’1); that is, the quality of approximations improves at every step. If the procedure is successful, the value β„±(π‘†π‘š) converges to the minimal possible value, β„±0, and the sequence of objects π‘†π‘š converges (i.e., W-converges) to some limit object 𝑆0. Then the continuity of the objective function β„± (proven earlier) guarantees that β„±(𝑆0)=β„±0; that is, 𝑆0 indeed provides the global minimum of the objective function, so it is the best-fitting object 𝑆0=𝑆best.

A problem arises if the limit object 𝑆0 does not belong to the given collection 𝕄; hence it is not admissible. Then we end up with a sequence of objects π‘†π‘š, each of which fits (approximates) the given points better than the previous one, but not as good as the next one. None of them would qualify as the best fit, and in fact the best fit would not exist, so the fitting problem would have no solution.

Illustrative Example
Suppose that our model objects are circles, and our given points are 𝑃1=(βˆ’1,0), 𝑃2=(0,0) and 𝑃3=(1,0). Then the sequence of circles π‘†π‘š defined by π‘₯2+(π‘¦βˆ’π‘š)2=π‘š2 will fit the points progressively better (tighter) as π‘š grows, so that β„±(π‘†π‘š)β†’0 as π‘šβ†’βˆž (Figure 2). On the other hand, no circle can pass through three collinear points; hence no circle 𝑆 satisfies β„±(𝑆)=0. Thus the circle-fitting problem has no solution.

The above sequence of circles π‘†π‘š converges (in our terms, W-converges) to the π‘₯ axis, which is a line, so it is natural to declare that line β€œthe best fit.’’ This may not be satisfactory in some practical applications, where one really needs to produce an estimate of the circle’s center and radius. But if we want to present the best-fitting object here, it is clearly and undeniably the line 𝑦=0.

Thus, in order to guarantee the existence of the best-fitting object in all cases, we need to include in our collection 𝕄 all objects that can be obtained as limits of sequences of objects from 𝕄. Such β€œlimit objects’’ are called limit points of 𝕄, in the language of topology.

Definition 4.1. A collection 𝕄 which already contains all its β€œlimit points’’ is said to be closed. If the collection 𝕄 is not closed, then an extended collection 𝕄 that includes all the limit points of 𝕄 is called the closure of 𝕄.

For example, the collection 𝕄𝐿 of all lines in ℝ2 is closed, as a sequence of lines can only converge to a line. The collection 𝕄𝐢 of all circles in ℝ2 is not closed, as a sequence of circles may converge to a line. The closure of the collection of circles 𝕄𝐢 includes all circles and all lines, that is, 𝕄𝐢=𝕄𝐢βˆͺ𝕄𝐿.(4.2) (Strictly speaking, a sequence of circles may also converge to a single point, see Section 5, so singletons need to be included too.)

We see that the collection 𝕄 of model objects must be closed if we want the best-fitting object to exist in all cases. If 𝕄 is not closed, we have to extend it by adding all its limit points and thus make it closed.

Theorem 4.2 (existence of the best fit). Suppose that the given collection 𝕄 of model objects is closed. Then for any given points 𝑃1,…,𝑃𝑛 there exists the best-fitting object 𝑆bestβˆˆπ•„, that is, the objective function β„± attains its global minimum on 𝕄.

Proof. The key ingredients of our proof will be the continuity of the objective function β„± and the compactness of a restricted domain of that function. We recall that a continuous real-valued function on a compact set always takes its maximum value and its minimum value on that set. In metric spaces (like our 𝕄), a subset 𝕄0βŠ‚π•„ is compact if every sequence of its elements π‘†π‘šβˆˆπ•„0 has a subsequence π‘†π‘šπ‘˜ that converges to another element π‘†βˆˆπ•„0, that is, π‘†π‘šπ‘˜β†’π‘† as π‘˜β†’βˆž.
We recall that our objective function β„± is continuous, and its domain 𝕄 is now assumed to be closed. If it was compact, the above general fact would guarantee that β„± had a global minimum, as desired. But in most practical settings 𝕄 is not compact. Indeed, if it was compact, the function β„± would not only have a minimum, but also a maximum. And this is usually impossible; one can usually find model objects arbitrarily far from the given points, thus making the distances from the points to the object arbitrarily large. Thus we need to find a smaller (restricted) subcollection 𝕄0βŠ‚π•„ which will be compact, and then we will apply the above general fact.
Let π·π‘Ÿ={π‘₯2+𝑦2β‰€π‘Ÿ2} denote the disk of radius π‘Ÿ centered on the origin (0,0). We can find a large enough disk π·π‘Ÿ that satisfies two conditions: (i) π·π‘Ÿ covers all the given points 𝑃1,…,𝑃𝑛 and (ii) π·π‘Ÿ intersects at least one object 𝑆0βˆˆπ•„, that is, π·βˆ©π‘†0β‰ βˆ…. The distances from the given points to 𝑆0 cannot exceed the diameter of π·π‘Ÿ, which is 2π‘Ÿ, hence β„±(𝑆0)≀(2π‘Ÿ)2𝑛.
Now we define our subcollection 𝕄0βŠ‚π•„; it consists of all model objects π‘†βˆˆπ•„ that intersect the larger disk 𝐷3π‘Ÿ of radius 3π‘Ÿ. Objects that lie entirely outside 𝐷3π‘Ÿ are not included in 𝕄0. Note that the subcollection 𝕄0 contains at least one object (𝑆0 mentioned above); hence 𝕄0 is not empty.
Recall that all our given points 𝑃1,…,𝑃𝑛 lie in π·π‘Ÿ. They are separated from the region outside the larger disk 𝐷3π‘Ÿ by a β€œno man’s land’’—the ring 𝐷3π‘Ÿβ§΅π·π‘Ÿ, which is 2π‘Ÿ wide. Thus the distances from the given points to any object 𝑆 which was not included in 𝕄0 are greater than 2π‘Ÿ; hence for such objects we have β„±(𝑆)>(2π‘Ÿ)2𝑛. So objects not included in 𝕄0 cannot fit our points better than 𝑆0 does. Thus they can be ignored in the process of minimization of β„±. More precisely, if we find the best-fitting object 𝑆best within the subcollection 𝕄0, then for any other object π‘†βˆˆπ•„β§΅π•„0 we will have β„±(𝑆)>(2π‘Ÿ)2𝑆𝑛β‰₯β„±0𝑆β‰₯β„±bestξ€Έ,(4.3) which shows that 𝑆best will be also the best-fitting object within the entire collection 𝕄.
It remains to verify that the subcollection 𝕄0 is compact; that is, every sequence of objects π‘†π‘šβˆˆπ•„0 has a subsequence converging to another object π‘†βˆ—βˆˆπ•„0. We will use the following standard fact. In a compact metric space any sequence of compact subsets has a convergent subsequence with respect to the Hausdorff metric.
Now let 𝑗=[3π‘Ÿ]+1 be the smallest integer greater than 3π‘Ÿ. Recall that all the objects in 𝕄0 are required to intersect 𝐷3π‘Ÿ; thus, they all intersect 𝐷𝑗 as well. The sets π‘†π‘šβˆ©π·π‘— are compact. By the above general fact, there is a subsequence π‘†π‘˜(𝑗) in the sequence π‘†π‘š and a compact subset π‘†βˆ—π‘—βŠ‚π·π‘— such that distH(π‘†π‘˜(𝑗)βˆ©π·π‘—,π‘†βˆ—π‘—)β†’0 as π‘˜β†’βˆž. Next, from the subsequence π‘†π‘˜(𝑗) we extract a subsubsequence, call it π‘†π‘˜(𝑗+1) that converges in the larger disk 𝐷𝑗+1; that is, such that distH(π‘†π‘˜(𝑗+1)βˆ©π·π‘—+1,π‘†βˆ—π‘—+1)β†’0 as π‘˜β†’βˆž for some compact subset π‘†βˆ—π‘—+1βŠ‚π·π‘—+1. Since distH(π‘†π‘˜(𝑗+1)βˆ©π·π‘—,π‘†βˆ—π‘—)β†’0, we see that π‘†βˆ—π‘—+1βˆ©π·π‘—=π‘†βˆ—π‘—; that is, the limit sets π‘†βˆ—π‘— and π‘†βˆ—π‘—+1 β€œagree’’ within 𝐷𝑗.
Then we continue this procedure inductively for the progressively larger disks 𝐷𝑗+2,𝐷𝑗+3,…. In the end we use standard Cantor’s diagonal argument to construct a single subsequence π‘†π‘šπ‘˜ such that for every 𝑖β‰₯𝑗 we have distH(π‘†π‘šπ‘˜βˆ©π·π‘–,π‘†βˆ—π‘–)β†’0 as π‘˜β†’βˆž, and the limiting subsets π‘†βˆ—π‘–βŠ‚π·π‘– β€œagree’’ in the sense π‘†βˆ—π‘–+1βˆ©π·π‘–=π‘†βˆ—π‘– for every 𝑖β‰₯𝑗. Then it follows that the sequence of objects π‘†π‘šπ‘˜ converges (i.e., W converges) to the closed set π‘†βˆ—=βˆͺ𝑖β‰₯π‘—π‘†βˆ—π‘–. The limit set π‘†βˆ— must belong to our collection 𝕄 because that collection was assumed to be closed. Lastly, π‘†βˆ— intersects the disk 𝐷3π‘Ÿ, so it also belongs to the subcollection 𝕄0. Our proof is now complete.

Lines and Planes
The most basic (and oldest) fitting application is fitting lines in 2D. The model collection 𝕄𝐿 consists of all lines πΏβŠ‚β„2. It is easy to see that a sequence of lines πΏπ‘š can only converge to another line; hence, the collection 𝕄𝐿 of lines is closed. This guarantees that the fitting problem always has a solution.
Similarly, the collection of lines in 3D is closed. The collection of planes in 3D is closed, too. Thus the problems of fitting lines and planes always have a solution. The same is true for the problem of fitting π‘˜-dimensional planes in a 𝑑-dimensional space ℝ𝑑 for any 1β‰€π‘˜<𝑑.

Circles and Spheres
A less-trivial task is fitting circles in 2D. Now the model collection 𝕄𝐢 consists of all circles πΆβŠ‚β„2. A sequence of circles πΆπ‘š can converge to an object of three possible types: a circle 𝐢0βŠ‚β„2, a line 𝐿0βŠ‚β„2, or a single point (singleton) 𝑃0βˆˆβ„2. Singletons can be regarded as degenerate circles (whose radius is zero), or they can be ignored based on the redundancy principle introduced in Section 3. But the lines are objects of a different type; thus, the collection 𝕄𝐢 of circles is not closed.
As a result, the circle fitting problem does not always have a solution; a simple example was given in Section 4. In order to guarantee the existence of the best-fitting object, one needs to extend the given collection 𝕄𝐢 by adding all its β€œlimit points,’’ in this case-lines. The extended collection will include all circles and all lines, that is, 𝕄𝐢=𝕄𝐢βˆͺ𝕄𝐿.(5.1) Now the (extended) circle fitting problem always has a solution; for any data points 𝑃1,…,𝑃𝑛 there is a best-fitting object 𝑆bestβˆˆπ•„πΆ that minimizes the sum of squares of the distances to 𝑃1,…,𝑃𝑛. One has to keep in mind, though, that the best-fitting object may be a line, rather than a circle.
The problem of fitting circles to data points has been around since the 1950’s [13]. Occasional nonexistence of the best-fitting circle has been first noticed empirically; see, for example, Berman and Culpin [22]. The first theoretical analysis of the non-existence phenomenon was done by Nievergelt [21] who traced it to the noncompactness of the underlying model space and concluded that lines needed to be included in the model space to guarantee the existence of the best fit. He actually studied this phenomenon in many dimensions, so he dealt with hyperspheres and hyperplanes. He called hyperplanes β€œgeneralized hyperspheres.’’ Similar analysis and conclusions, for circles and lines, were published later, independently, by Chernov and Lesort [23] and by Zelniker and Clarkson [24]; see also a recent book [4].

Ellipses
A popular task in computer vision is fitting ellipses. Now the model collection 𝕄𝐸 consists of all ellipses πΈβŠ‚β„2. A sequence of ellipses πΈπ‘š may converge to an object of many possible types: an ellipse, a parabola, a line, a pair of parallel lines, a segment of a line, a half-line (a ray); see Figure 3.

Line segments can be regarded as degenerate ellipses whose minor axis is zero. Alternatively, line segments and half-lines can be ignored based on the redundancy principle (Section 3). Thus we have lines, pairs of parallel lines, and parabolas as limit objects of different types.

We see that the collection 𝕄𝐸 of ellipses is not closed. As a result, the ellipse-fitting problem does not always have a solution. For example, this happens when given points that are placed on a line or on a parabola. (Actually, there are many more examples; see Section 6.)

In order to guarantee the existence of the best-fitting object, one needs to extend the collection 𝕄𝐸 of ellipses by adding all its β€œlimit points,’’ in this caseβ€”lines, pairs of parallel lines, and parabolas. We denote that extended collection by 𝕄𝐸 that 𝕄𝐸=𝕄𝐸βˆͺ𝕄𝐿βˆͺ𝕄‖βˆͺ𝕄βˆͺ,(5.2) where 𝕄‖ denotes the collection of pairs of parallel lines, and 𝕄βˆͺ the collection of parabolas (the symbol βˆͺ resembles a parabola).

Now the (extended) ellipse-fitting problem always has a solution, for any data points 𝑃1,…,𝑃𝑛 there will be a best-fitting object 𝑆bestβˆˆπ•„πΈ that minimizes the sum of squares of the distances to 𝑃1,…,𝑃𝑛. One has to keep in mind, though, that the best-fitting object may be a line, or a pair of parallel lines, or a parabola, rather than an ellipse.

The ellipse fitting problem has been around since the 1970s. The need to deal with limiting cases was first notice by Bookstein [16] who wrote β€œThe fitting of a parabola is a limiting case, exactly transitional between ellipse and hyperbola. As the center of ellipse moves off toward infinity while its major axis and the curvature of one end are held constant….’’ A first theoretical analysis on the non-existence of the best ellipse was done by Nievergelt in [25] who traced it to the noncompactness of the underlying model space and concluded that parabolas needed to be included in the model space to guarantee the existence of the best fit.

Conics
We now turn to a more general task of fitting quadratic curves (also known as conic sections, or simply conics). Now the model collection 𝕄𝑄 consists of all nondegenerate quadratic curves, by which we mean ellipses, parabolas, and hyperbolas. A sequence of conics may converge to an object of many possible types: a conic, a line, a line segment, a ray, two opposite rays, and a pair of parallel lines, and a pair of intersecting lines; see Figures 3 and 4.

Every line segment, ray, and two opposite rays are a part (subset) of a full line, so these objects can be ignored based on the redundancy principle (Section 3). But pairs of parallel lines and pairs of intersecting lines constitute limit objects of new types. We see that the collection 𝕄𝑄 of quadratic curves (conics) is not closed. As a result, the conic fitting problem does not always have a solution.

In order to guarantee the existence of the best-fitting object, one needs to extend the collection 𝕄𝑄 of quadratic curves by adding all its β€œlimit points’’—pairs of lines (which may be intersecting, parallel, or coincident) 𝕄𝑄=𝕄𝑄βˆͺ𝕄𝐿𝐿,(5.3) where 𝕄𝐿𝐿 denotes the collection of pairs of lines.

Now the (extended) conic fitting problem always has a solution; for any data points 𝑃1,…,𝑃𝑛 there is a best-fitting object 𝑆bestβˆˆπ•„π‘„ that minimizes the sum of squares of the distances to 𝑃1,…,𝑃𝑛. One has to keep in mind, though, that the best-fitting object may be a pair of lines, rather than a genuine conic.

6. Sufficient and Deficient Models

We see that if the collection 𝕄 of adopted model objects is not closed, then in order to guarantee the existence of the best fit we have to close it up, that is, add to 𝕄 all objects that are obtained as limits of existing objects. The new, extended collection 𝕄 will be closed and will always provide the best fit. Here we discuss practical implications of extending 𝕄.

Definition 6.1. The original objects π‘†βˆˆπ•„ are called primary objects and the objects that have been added, that is, π‘†βˆˆπ•„β§΅π•„ are called secondary objects.

The above extension of 𝕄 by secondary objects, though necessary, may not be completely harmless. Suppose again that one needs to fit circles to observed points. Since the collection of circles 𝕄𝐢 is not closed, one has to add lines to it. That is, one really has to operate with the extended collection 𝕄𝐢=𝕄𝐢βˆͺ𝕄𝐿 in which circles are primary objects and lines are secondary objects. The best-fitting object to a given set of points will then belong to 𝕄𝐢; hence occasionally it will be a line, rather than a circle.

In practical applications, though, a line may not be totally acceptable as the best-fitting object. For example, if one needs to produce an estimate of the circle’s radius and center, a line will be of little help; it has no radius or center. Thus, when a secondary object happens to be the best fit to the given data points, various unwanted complications may arise.

Next we investigate how frequently such unwanted events occur.

In most applications points are obtained (measured or observed) with some random noise. The noise has a probability distribution, which in statistical studies is usually assumed to be normal [3, 4]. For our purposes it is enough to just assume that given points 𝑃1,…,𝑃𝑛 have an absolutely continuous distribution:

Assumption 6.2. The points 𝑃1,…,𝑃𝑛 are obtained randomly with a probability distribution that has a joint probability density 𝜌(π‘₯1,𝑦1,…,π‘₯𝑛,𝑦𝑛)>0.

Let us first see how frequently the best-fitting circle fails to exist under this assumption. We begin with the simplest case of 𝑛=3 points. If they are not collinear, then they can be interpolated by a circle. If they are collinear (and distinct), there is no interpolating circle, so the best fit is achieved by a line. The three points (π‘₯𝑖,𝑦𝑖), 1≀𝑖≀3 are collinear if and only if ξ€·π‘₯1βˆ’π‘₯2𝑦1βˆ’π‘¦3ξ€Έ=ξ€·π‘₯1βˆ’π‘₯3𝑦1βˆ’π‘¦2ξ€Έ.(6.1) This equation defines a hypersurface in ℝ6. Under the above assumption, the probability of that hypersurface is zero; that is, the collinearity occurs with probability zero. In simple practical terms, it is β€œimpossible,’’ it β€œnever happens.’’ If we generate three data points by using a computer random number generator, we will practically never see collinear points. All our practical experience tells us; the best-fitting circle always exists. Perhaps for this reason the circle-fitting problem is usually studied without including lines in the collection of model objects.

In the case of 𝑛>3 data points, the best-fitting circle fails to exist when the points are collinear, but there are other instances too. For example, let 𝑛=4 points be at (0,1), (0,βˆ’1), (𝐴,0), and (βˆ’π΅,0) for some large 𝐴,𝐡≫1. Then it is not hard to check, by direct inspection, that the best-fitting circle fails to exist, and the best fit is achieved by the line 𝑦=0.

It is not easy to describe all sets of points for which the best-fitting circle fails to exist, but such a description existsβ€”it is given in [4] (see Theorem 8 on page 68 there). We skip technical details, only state the final conclusion; for every 𝑛>3, the best-fitting circle exists unless the (2𝑛)-vector (π‘₯1,𝑦1,…,π‘₯𝑛,𝑦𝑛) belongs to a certain algebraic submanifold in ℝ2𝑛. So, under our assumption, the best-fitting circle exists with probability one.

In other words, no matter how large or small the data set is, the best-fitting circle will exist with probability one, so practically we never have to resort to secondary objects (lines); that is, we never have to worry about existence. The model of circles is adequate and sufficient; it does not really require an extension.

Definition 6.3. We say that a collection 𝕄 of model objects is sufficient (for fitting purposes) if the best-fitting object 𝑆best exists with probability one under the above assumption. Otherwise the model collection will be called deficient.

We note that under our assumption an event occurs with probability zero if and only if it occurs on a subset π΄βŠ‚β„2𝑛 of Lebesgue measure zero, that is, Leb(𝐴)=0. Therefore, our notion of sufficiency does not depend on the choice of the probability density 𝜌.

As we have just seen, the collection of circles is sufficient for fitting purposes. Next we examine the model collection 𝕄𝐸 of ellipses πΈβŠ‚β„2. As we know, this collection is not closed; its closure must include all lines, pairs of parallel lines, and parabolas.

Ellipses
Let us see how frequently the best-fitting ellipse fails to exist. It is particularly easy to deal with 𝑛=5 data points. For any set of distinct 5 points in a general linear position (which means that no three points are collinear), there exists a unique quadratic curve (conic) that passes through all of them (i.e., interpolates them). That conic may be an ellipse, or a parabola, or a hyperbola. If 5 points are not in generic linear position (i.e., at least three of them are collinear), then they can be interpolated by a degenerate conic (a pair of lines).
If the interpolating conic is an ellipse, then obviously that ellipse is the best fit (the objective function takes it absolute minimum value, zero). What if the interpolating conic is a parabola or a pair of parallel lines? Then it is a secondary object in the extended model (5.2), and we have an unwanted event; a secondary object provides the best fit. We remark, however, that this event occurs with probability zero, so it is not a real concern. But there are other ways in which unwanted events occur, see below.
Suppose that the interpolating conic is a hyperbola or a pair of intersecting lines. Now the situation is less clear, as such an interpolating conic would not belong to the extended collection 𝕄𝐸; see (5.2). Is it possible then that the best-fitting object 𝑆bestβˆˆπ•„πΈ is an ellipse? The answer is β€œno’’, and our argument is based on the following theorem that is interesting itself:

Theorem 6.4 (no local minima for 𝑛=5). For any set of five points the objective function β„± has no local minima on conics. In other words, for any non interpolating conic 𝑆 there exist other conics π‘†ξ…ž, arbitrarily close to 𝑆, which fit our data points better than 𝑆 does, that is, β„±(π‘†ξ…ž)<β„±(𝑆).

The proof is given in the appendix, and here we answer the previous question. Suppose 𝑛=5 points are interpolated by a hyperbola or a pair of intersecting lines. If there were a best-fitting ellipse 𝐸bestβˆˆπ•„πΈ, then no other ellipse could provide a better fit; that is, for any other ellipse πΈβˆˆπ•„πΈ we would have β„±(𝐸)β‰₯β„±(𝐸best), which contradicts the above theorem. Thus the best-fitting object 𝑆bestβˆˆπ•„πΈ is not an ellipse but a secondary object (a parabola, or a line, or a pair of parallel lines), that is, an unwanted event occurs.

One can easily estimate the probability of the above unwanted event numerically. For a given probability distribution one can generate random samples of 𝑛=5 points and for each sample find the interpolating conic (by elementary geometry). Every time that conic happens to be other than an ellipse, an unwanted event occurs; that is, the best fit from within the collection (5.2) is provided by a secondary object.

We have run the above experiment with a standard normal distribution, where each coordinate π‘₯𝑖 and 𝑦𝑖 (for 𝑖=1,…,5) was generated independently according to a standard normal law 𝒩(0,1). We found that the random points were interpolated by an ellipse with probability 22% and by a hyperbola with probability 78%. (Other conics, including parabolas, never turned up; in fact, it is not hard to prove that they occur with probability zero.) This is a striking result; hyperbolas actually dominate over ellipses! Thus the best-fitting ellipse fails to exist with a probability as high as 78%.

We note that this percentage will remain the same if we generate points (π‘₯𝑖,𝑦𝑖) independently according to any other 2D normal distribution. Indeed, any 2D normal distribution can be transformed to a standard 2D normal distribution by a linear map of the plane ℝ2. Now under linear transformations conics are transformed to conics, and their types are preserved (i.e., ellipses are transformed to ellipses, hyperbolas to hyperbolas, etc.). Therefore the type of the interpolating conic cannot change.

In another experiment we sampled points from the unit square [0,1]Γ—[0,1] with a uniform distribution; that is, we selected each coordinate π‘₯𝑖 and 𝑦𝑖 independently from the unit interval [0,1]. In this experiment ellipses turned up with probability 28% and hyperbolas with probability 72%, again an overwhelming domination of hyperbolas!

And again, these percentages will remain the same if we generate points (π‘₯𝑖,𝑦𝑖) according to any other rectangular (uniform) distribution. Indeed, any rectangle can be transformed to the square [0,1]Γ—[0,1] by a linear map of the plane, and then the same argument as before applies.

We also conducted numerical tests for 𝑛>5 points. In these tests we generated points 𝑃1,…,𝑃𝑛, independently according to four predefined distributions: normal with covariance matrix 𝐕1=ξ€Ί1001ξ€», normal with covariance matrix 𝐕2=ξ€Ί4001ξ€», uniform in the square [0,1]Γ—[0,1], uniform in the rectangle [0,2]Γ—[0,1]. Table 2 shows the percentage of randomly generated samples of 𝑛 points for which the best-fitting ellipse exists.

We see that in most cases the best-fitting ellipse fails to exist with a high probability. Only for the uniform distribution is the square [0,1]Γ—[0,1] the failures decrease as π‘›β†’βˆž, and apparently the probability of existence of the best-fitting ellipse grows to 100%. We will explain this in Section 7.

We note that for 𝑛β‰₯6 points there is no interpolating conic, so one has to find a global minimum of β„± via an extensive search over the model space 𝕄𝑄 of conics. This is a very time-consuming procedure and its results are never totally reliable. Thus our estimates, especially for larger 𝑛’s, are quite approximate, but we believe that they are accurate to within 5%.

Insufficiency of Ellipses
We conclude that the collection of ellipses is not sufficient for fitting purposes. This means that there is a real chance that for a given set of data points no ellipse could be selected as the best fit to the points; that is, the ellipse-fitting problem would have no solution. More precisely, for any ellipse 𝐸 there will be another ellipse 𝐸′ providing a better fit, in the sense β„±(𝐸′)<β„±(𝐸). If one constructs a sequence of ellipses that fit the given points progressively better and on which the objective function β„± converges to its infimum, then those ellipses will grow in size and converge to something different than an ellipse (most likely, to a parabola).
In practical terms, one usually runs a computer algorithm that executes an iterative procedure such as Gauss-Newton or Levenberg-Marquardt. It produces a sequence of ellipses πΈπ‘š (here π‘š denotes the iteration number) such that β„±(πΈπ‘š)<β„±(πΈπ‘šβˆ’1); that is, the quality of approximations improves at every step, but those ellipses would keep growing in size and approach a parabola. This situation commonly occurs in practice [16, 25, 26], and one either accepts a parabola as the best fit or choose an ellipse by an alternative procedure (e.g., by sirect ellipse fit [27]).
We see that whenever the best-fitting ellipse fails to exist, the ellipse-fitting procedure will attempt to move beyond the collection of ellipses, so it will end up on the border of that collection and then return a secondary object (a parabola or a pair of lines). In a sense, the scope of the collection of ellipses is too narrow for fitting purposes. In other words, this collection is deficient, or badly incomplete. Its deficiency indicates that it should be substantially extended for the fitting problem to have a reasonable (adequate) solution with probability one.

Conics
Such an extension is achieved by the collection of all conics (ellipses, hyperbolas, and parabolas) denoted by 𝕄𝑄; see Section 5. If one searches for the best-fitting object 𝑆best in the entire collection of conics, then 𝑆best will exist with probability one. This was confirmed in numerous computer experiments that we have conducted. In fact, the best-fitting object has always been either an ellipse or hyperbola, so parabolas apparently can be excluded. Thus our numerical tests strongly suggest that the collection 𝕄𝐸𝐻=𝕄𝐸βˆͺ𝕄𝐻 of ellipses, and hyperbolas is sufficient (though we do not have a proof).

7. Generalizations

In the basic formula (1.1) all the 𝑛 points make equal contributions to β„±. In some applications, though, one needs to minimize a weighted sum ℱ𝑀(𝑆)=𝑛𝑖=1𝑀𝑖𝑃dist𝑖,𝑆2,(7.1) for some fixed 𝑀𝑖>0. This problem is known as weighted least squares. All our results on the existence of the best fit apply to it as well.

More generally, given a probability distribution dP(π‘₯,𝑦) on ℝ2, one can β€œfit’’ an object 𝑆 to it (or approximate it) by minimizing ℱ𝑃(𝑑𝑆)=2(π‘₯,𝑦)dP(π‘₯,𝑦),(7.2) where 𝑑(π‘₯,𝑦)=dist[(π‘₯,𝑦),𝑆]. Note that (7.1) is a particular version of (7.2) in which dP is a discrete measure with a finite support. In order to guarantee the finiteness of the integral in (7.2) one needs to assume that the distribution dP has finite second moments. After that all our results apply.

The generalization (7.2) is relevant to the numerical tests described in Section 6. Suppose again that 𝑛 points are selected independently according to a probability distribution dP(π‘₯,𝑦) on ℝ2. As 𝑛 grows, due to the law of large numbers of probability theory that the normalized objective function (1/𝑛)β„± converges exactly to the integral (7.2).

Now the global minimum of the function ℱ𝑃(𝑆) would give us the best-fitting object corresponding to the given probability distribution dP, which would be the best β€œasymptotic’’ fit to a random sample of size 𝑛 selected from the distribution dP, as π‘›β†’βˆž. This relation helps us explain some experimental results reported in Section 6.

Suppose that dP is a uniform distribution in a rectangle 𝑅=[0,𝐿]Γ—[0,1]. Then the integral in (7.2) becomes ℱ𝑃(𝑆)=πΏβˆ’1ξ€œπΏ0ξ€œ10𝑑2(π‘₯,𝑦)𝑑𝑦𝑑π‘₯.(7.3) We have computed it numerically in order to find the best-fitting conic 𝑆. The results are given below.

Perfect Square (𝑅=[0,1]Γ—[0,1])
One would naturally expect that the best fit to a mass uniformly distributed in a perfect square would be either a circle or a pair of lines (say, the diagonals of the square).

What we found was totally unexpected; the best-fitting conic is an ellipse! Its center coincides with the center of the square, and its axes are aligned with the sides of the square, but its axes are different; the major axis is 2π‘Ž=0.9489 and the minor axis is 2𝑏=0.6445 (assuming that the square has unit side). In fact, there are exactly two such ellipses; one is oriented horizontally and the otherβ€”vertically; see the left panel of Figure 5. These two ellipses beat any circle and any pair of lines (see the table in Figure 5); they provide the global minimum of the objective function (7.3).

The fact that the best-fitting conic to a mass distributed in a square is an ellipse explains why for large samples generated from the uniform distribution in a square ellipses dominate over hyperbolas (Section 6).

As the rectangle 𝑅 gets extended horizontally, that is, as 𝐿>1 increases the best-fitting conic changes. We found that for 𝐿<𝐿1β‰ˆ1.2, the best-fitting conic is still an ellipse (though it gets elongated compared to the one found for the square, when 𝐿=1). But for 𝐿>𝐿1, the best-fitting conic abruptly changes from an ellipse to a pair of parallel lines. Those two lines are running through the rectangle horizontally; their equations are 𝑦=0.25 and 𝑦=0.75 (See Figure 6).

This fact might explain why for longer rectangles, such as 𝑅=[0,2]Γ—[0,1] used in Section 6, the percentage of samples for which the best-fitting ellipse exists does not grow much as 𝑛 increases. Though it remains unclear which type of conics (ellipses or hyperbolas) dominates for large samples taken from longer rectangles [0,𝐿]Γ—[0,1] with 𝐿>𝐿1.

8. Megaspace

Our conclusions can be illustrated by an interesting construction in a multidimensional space β€œmegaspace’’. It was first used by Malinvaud [28, Chapter 10] and then by Chernov [4, Sections 1.5 and 3.4].

Recall that given 𝑛 data points 𝑃1=(π‘₯1,𝑦1),…, 𝑃𝑛=(π‘₯𝑛,𝑦𝑛) and a model object 𝑆 (a closed subset of ℝ2), our objective function β„±(𝑆) is defined by (1.1) and for the distances dist(𝑃𝑖,𝑆) we can write 𝑃dist𝑖,𝑆2=min(π‘₯′𝑖,𝑦′𝑖)βˆˆπ‘†ξ‚†ξ€·π‘₯π‘–βˆ’π‘₯ξ…žπ‘–ξ€Έ2+ξ€·π‘¦π‘–βˆ’π‘¦ξ…žπ‘–ξ€Έ2.(8.1) Thus we can express the objective function β„±(𝑆) as follows: ξƒ―β„±(𝑆)=min𝑛𝑖=1π‘₯π‘–βˆ’π‘₯ξ…žπ‘–ξ€Έ2+ξ€·π‘¦π‘–βˆ’π‘¦ξ…žπ‘–ξ€Έ2ξ‚„;ξ€·π‘₯ξ…žπ‘–,π‘¦ξ…žπ‘–ξ€Έξƒ°.βˆˆπ‘†βˆ€π‘–(8.2)

Now let us represent the 𝑛 data points 𝑃1=(π‘₯1,𝑦1),…,𝑃𝑛=(π‘₯𝑛,𝑦𝑛) by one point (β€œmegapoint’’) 𝒫 in the 2𝑛-dimensional space ℝ2𝑛 with coordinates π‘₯1,𝑦1,…,π‘₯𝑛,𝑦𝑛. For the given object 𝑆, let us also define a multidimensional set β€œmegaset’’ π”π‘†βŠ‚β„2𝑛 as follows: ξ€·π‘₯𝒫′=ξ…ž1,π‘¦ξ…ž1,…,π‘₯ξ…žπ‘›,π‘¦ξ…žπ‘›ξ€Έβˆˆπ”π‘†βŸΊξ€·π‘₯ξ…žπ‘–,π‘¦ξ…žπ‘–ξ€Έβˆˆπ‘†βˆ€π‘–.(8.3) Note that βˆ‘π‘›π‘–=1[(π‘₯π‘–βˆ’π‘₯ξ…žπ‘–)2+(π‘¦π‘–βˆ’π‘¦ξ…žπ‘–)2] in (8.2) is the square of the distance from 𝒫 to π’«β€²βˆˆπ”π‘†, in the megaspace ℝ2𝑛. Therefore consider β„±(𝑆)=minπ’«ξ…žβˆˆπ”π‘†ξ€Ίξ€·dist𝒫,π’«ξ…žξ€Έξ€»2=ξ€Ίξ€·dist𝒫,𝔐𝑆2.(8.4) Next given a collection 𝕄 of model objects we define a large megaset 𝔐(𝕄)βŠ‚β„2𝑛 as follows: 𝔐(𝕄)=βˆͺπ‘†βˆˆπ•„π”π‘†. Alternatively, it can be defined as ξ€·π‘₯ξ…ž1,π‘¦ξ…ž1,…,π‘₯ξ…žπ‘›,π‘¦ξ…žπ‘›ξ€Έξ€·π‘₯βˆˆπ”(𝕄)βŸΊβˆƒπ‘†βˆˆπ•„βˆΆξ…žπ‘–,π‘¦ξ…žπ‘–ξ€Έβˆˆπ‘†βˆ€π‘–.(8.5) The best-fitting object 𝑆best minimizes the function β„±(𝑆). Thus, due to (8.4), 𝑆best minimizes the distance from the megapoint 𝒫 representing the data set 𝑃1,…,𝑃𝑛 to the megaset 𝔐(𝕄) representing the collection 𝕄.

Thus, the problem of finding the best-fitting object 𝑆best reduces to the problem of finding the megapoint π’«ξ…žβˆˆπ”(𝕄) that is closest to the given megapoint 𝒫 (representing the given 𝑛 points). In other words, we need to project the megapoint 𝒫 onto the megaset 𝔐(𝕄), and the footpoint π’«ξ…ž of the projection would correspond to the best-fitting object 𝑆best.

We conclude that the best-fitting object 𝑆bestβˆˆπ•„ exists if and only if there exists a megapoint π’«ξ…žβˆˆπ”(𝕄) that is closest to the given megapoint 𝒫. It is a simple fact that given a set π·βŠ‚β„π‘‘ with 𝑑β‰₯1, the closest point π‘‹ξ…žβˆˆπ· to a given point π‘‹βˆˆβ„π‘‘ exists for any π‘‹βˆˆβ„π‘‘ if and only if the set 𝐷 is closed. Thus the existence of the best-fitting object 𝑆best requires the megaset 𝔐(𝕄) to be topologically closed. Again we see that the property of closedness is vital for the fitting problem to have a solution for every set of data points 𝑃1,…,𝑃𝑛.

Theorem 8.1 (closedness of megasets). If the model collection 𝕄 is closed (in the sense of W convergence), then the megaset 𝔐(𝕄) is closed in the natural topology of ℝ2𝑛.

Clearly, this theorem provides an alternative proof of the existence of the best-fitting object, provided that the model collection 𝕄 is closed.

Proof. Suppose a sequence of megapoints 𝒫(π‘˜)=ξ‚€π‘₯1(π‘˜),𝑦1(π‘˜),…,π‘₯𝑛(π‘˜),𝑦𝑛(π‘˜),(8.6) all belonging to the megaset 𝔐(𝕄), converges, as π‘˜β†’βˆž, to a megapoint 𝒫(∞)=ξ‚€π‘₯1(∞),𝑦1(∞),…,π‘₯𝑛(∞),𝑦𝑛(∞)(8.7) in the usual topology of ℝ2𝑛. Equivalently, for every 𝑖=1,…,𝑛 the point (π‘₯𝑖(π‘˜),𝑦𝑖(π‘˜)) converges, as π‘˜β†’βˆž, to the point (π‘₯𝑖(∞),𝑦𝑖(∞)). We need to show that 𝒫(∞)βˆˆπ”(𝕄).
Now for every π‘˜ there exists a model object 𝑆(π‘˜)βˆˆπ•„ that contains the 𝑛 points (π‘₯1(π‘˜),𝑦1(π‘˜)),…,(π‘₯𝑛(π‘˜),𝑦𝑛(π‘˜)). The sequence {𝑆(π‘˜)} contains a convergent subsequence π‘†π‘˜π‘Ÿ, that is, such that π‘†π‘˜π‘Ÿβ†’π‘† as π‘Ÿβ†’βˆž for some closed set π‘†βŠ‚β„2. (The existence of a convergent subsequence can be verified by the diagonal argument as in the proof of the existence theorem in Section 4.)
As we assumed that the collection 𝕄 is closed, it follows that 𝕄 contains the limit object 𝑆, that is, π‘†βˆˆπ•„. It is intuitively clear (and can be verified by a routine calculus-type argument) that 𝑆 contains all the limit points (π‘₯1(∞),𝑦1(∞)),…,(π‘₯𝑛(∞),𝑦𝑛(∞)). Therefore the limit megapoint 𝒫(∞) belongs to the megaset 𝔐(𝕄). This proves that the latter is closed.

Megaset for Lines
Let 𝕄𝐿 consist of all lines in ℝ2. The corresponding megaset 𝔐(𝕄𝐿)βŠ‚β„2𝑛 is described in [28, Chapter 10]. A megapoint (π‘₯1,𝑦1,…,π‘₯𝑛,𝑦𝑛) belongs to 𝔐(𝕄𝐿) if and only if all the 𝑛 planar points (π‘₯1,𝑦1),…,(π‘₯𝑛,𝑦𝑛) belong to one line (i.e., they are collinear). This condition can be expressed by 𝐢𝑛,3 algebraic relations as follow: ⎑⎒⎒⎣π‘₯detπ‘–βˆ’π‘₯π‘—π‘¦π‘–βˆ’π‘¦π‘—π‘₯π‘–βˆ’π‘₯π‘˜π‘¦π‘–βˆ’π‘¦π‘˜βŽ€βŽ₯βŽ₯⎦=0(8.8) for all 1≀𝑖<𝑗<π‘˜β‰€π‘›. Each of these relations means that the three points (π‘₯𝑖,𝑦𝑖), (π‘₯𝑗,𝑦𝑗), and (π‘₯π‘˜,π‘¦π‘˜) are collinear; cf. (6.1). All of these relations together mean that all the 𝑛 points (π‘₯1,𝑦1),…,(π‘₯𝑛,𝑦𝑛) are collinear.
Note that 𝕄𝐿 is specified by π‘›βˆ’2-independent relations; hence it is an (𝑛+2)-dimensional manifold (algebraic variety) in ℝ2𝑛. The relations (8.8) are quadratic, so 𝕄𝐿 is a quadratic surface. It is closed in topological sense; hence the problem of finding the best-fitting line always has a solution.

Megaset for Circles
Let 𝕄𝐢 consist of all circles in ℝ2. The corresponding megaset 𝔐(𝕄𝐢)βŠ‚β„2𝑛 is described in [4, Section 3.4], and we briefly repeat it here. A megapoint (π‘₯1,𝑦1,…,π‘₯𝑛,𝑦𝑛) belongs to 𝔐(𝕄𝐢) if and only if all the 𝑛 planar points (π‘₯1,𝑦1),…,(π‘₯𝑛,𝑦𝑛) belong to one circle (we say that they are β€œcocircular’’). In that case all these points satisfy one quadratic equation of a special type as follow: 𝐴π‘₯2+𝑦2ξ€Έ+𝐡π‘₯+𝐢𝑦+𝐷=0.(8.9) This condition can be expressed by 𝐢𝑛,4 algebraic relations as follow: ⎑⎒⎒⎒⎒⎣π‘₯detπ‘–βˆ’π‘₯π‘—π‘¦π‘–βˆ’π‘¦π‘—π‘₯2π‘–βˆ’π‘₯2𝑗+𝑦2π‘–βˆ’π‘¦2𝑗π‘₯π‘–βˆ’π‘₯π‘˜π‘¦π‘–βˆ’π‘¦π‘˜π‘₯2π‘–βˆ’π‘₯2π‘˜+𝑦2π‘–βˆ’π‘¦2π‘˜π‘₯π‘–βˆ’π‘₯π‘šπ‘¦π‘–βˆ’π‘¦π‘šπ‘₯2π‘–βˆ’π‘₯2π‘š+𝑦2π‘–βˆ’π‘¦2π‘šβŽ€βŽ₯βŽ₯βŽ₯βŽ₯⎦=0(8.10) for 1≀𝑖<𝑗<π‘˜<π‘šβ‰€π‘›. Note that (8.10) include megapoints satisfying (8.8); hence they actually describe the union 𝔐(𝕄𝐢)βˆͺ𝔐(𝕄𝐿).
The determinant in (8.10) is a polynomial of degree four, and 𝔐(𝕄𝐢)βˆͺ𝔐(𝕄𝐿) is an (𝑛+3)-dimensional algebraic variety (manifold) in ℝ2𝑛 defined by quartic polynomial equations. Note that the dimension of 𝔐(𝕄𝐢)βˆͺ𝔐(𝕄𝐿) is one higher than that of 𝔐(𝕄𝐿), that is, 𝕄dim𝔐𝐢𝕄βˆͺ𝔐𝐿𝕄=dim𝔐𝐿+1.(8.11) A closer examination shows that 𝔐(𝕄𝐿) plays the role of the boundary of 𝔐(𝕄𝐢), that is, 𝔐(𝕄𝐢) terminates on 𝔐(𝕄𝐿). The megaset 𝔐(𝕄𝐢) is not closed, but if we add its boundary 𝔐(𝕄𝐿) to it, it will become closed.

Megaset for Ellipses
Let 𝕄𝐸 consist of all ellipses in ℝ2. The corresponding megaset 𝔐(𝕄𝐸)βŠ‚β„2𝑛 can be described in a similar manner as above. A point (π‘₯1,𝑦1,…,π‘₯𝑛,𝑦𝑛) belongs in 𝔐(𝕄𝐸) if and only if all the 𝑛 planar points (π‘₯1,𝑦1),…,(π‘₯𝑛,𝑦𝑛) belong to one ellipse (we say that they are β€œcoelliptical’’). In that case all these points satisfy one quadratic equation of a general type as follow: 𝐴π‘₯2+2𝐡π‘₯𝑦+𝐢𝑦2+2𝐷π‘₯+2𝐸𝑦+𝐹=0.(8.12) This equation actually means that the points belong to one conic (either regular or degenerate). This condition can be expressed by 𝐢𝑛,6 algebraic relations as follow: ⎑⎒⎒⎒⎒⎒⎒⎒⎒⎣π‘₯detπ‘–βˆ’π‘₯π‘—π‘¦π‘–βˆ’π‘¦π‘—π‘₯2π‘–βˆ’π‘₯2𝑗𝑦2π‘–βˆ’π‘¦2𝑗π‘₯π‘–π‘¦π‘–βˆ’π‘₯𝑗𝑦𝑗π‘₯π‘–βˆ’π‘₯π‘˜π‘¦π‘–βˆ’π‘¦π‘˜π‘₯2π‘–βˆ’π‘₯2π‘˜π‘¦2π‘–βˆ’π‘¦2π‘˜π‘₯π‘–π‘¦π‘–βˆ’π‘₯π‘˜π‘¦π‘˜π‘₯π‘–βˆ’π‘₯π‘šπ‘¦π‘–βˆ’π‘¦π‘šπ‘₯2π‘–βˆ’π‘₯2π‘šπ‘¦2π‘–βˆ’π‘¦2π‘šπ‘₯π‘–π‘¦π‘–βˆ’π‘₯π‘šπ‘¦π‘šπ‘₯π‘–βˆ’π‘₯π‘™π‘¦π‘–βˆ’π‘¦π‘™π‘₯2π‘–βˆ’π‘₯2𝑙𝑦2π‘–βˆ’π‘¦2𝑙π‘₯π‘–π‘¦π‘–βˆ’π‘₯𝑙𝑦𝑙π‘₯π‘–βˆ’π‘₯π‘Ÿπ‘¦π‘–βˆ’π‘¦π‘Ÿπ‘₯2π‘–βˆ’π‘₯2π‘Ÿπ‘¦2π‘–βˆ’π‘¦2π‘Ÿπ‘₯π‘–π‘¦π‘–βˆ’π‘₯π‘Ÿπ‘¦π‘ŸβŽ€βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯βŽ₯⎦=0(8.13) for 1≀𝑖<𝑗<π‘˜<π‘š<𝑙<π‘Ÿβ‰€π‘›. Each of these relations means that the six points (π‘₯𝑖,𝑦𝑖), (π‘₯𝑗,𝑦𝑗), (π‘₯π‘˜,π‘¦π‘˜), (π‘₯π‘š,π‘¦π‘š), (π‘₯𝑙,𝑦𝑙), and (π‘₯π‘Ÿ,π‘¦π‘Ÿ) satisfy one quadratic equation of type (8.12); that is, they belong to one conic (either regular or degenerate). All of these relations together mean that all the 𝑛 points (π‘₯1,𝑦1),…,(π‘₯𝑛,𝑦𝑛) satisfy one quadratic equation of type (8.12); that is, they all belong to one conic (regular or degenerate). Therefore the relations (8.13) describe a much larger megaset 𝔐(𝕄Conics) corresponding to the collection of all conics, regular and degenerate, that is, 𝕄Conics=𝕄𝐸βˆͺ𝕄𝐻βˆͺ𝕄βˆͺβˆͺ𝕄𝐿𝐿(8.14) using our previous notation.
The determinant in (8.13) is a polynomial of the eighth degree, and 𝔐(𝕄Conics) is a closed (𝑛+5)-dimensional algebraic manifold in ℝ2𝑛. It is mostly made of two big megasets: 𝔐(𝕄𝐸) and 𝔐(𝕄𝐻) and they both are (𝑛+5)-dimensional. Other megasets (parabolas and pairs of lines) have smaller dimensions and play the role of the boundaries of the bigger megasets 𝔐(𝕄𝐸) and 𝔐(𝕄𝐻).

Illustrations
The structure of the megaset 𝔐(𝕄Conics) is schematically depicted in Figure 7, where it is shown as the π‘₯𝑦 plane {𝑧=0} in the 3D space (the latter plays the role of the megaspace ℝ2𝑛). The positive half-plane 𝐻+={𝑦>0,𝑧=0} represents the elliptic megaset 𝔐(𝕄𝐸), and the negative half-plane π»βˆ’={𝑦<0,𝑧=0} represents the hyperbolic megaset 𝔐(𝕄𝐻). The π‘₯-axis {𝑦=𝑧=0} separating these two half-planes represents the lower-dimensional megasets 𝔐(𝕄βˆͺβˆͺ𝕄𝐿𝐿) in the decomposition (8.14). True, the real structure of 𝔐(𝕄Conics) is much more complicated, but our simplified picture still shows its most basic features.

Now recall that finding the best-fitting conic corresponds to an orthogonal projection of the given megapoint 𝒫 in the megaspace ℝ2𝑛 (in our illustration, it would be a point (π‘₯,𝑦,𝑧)βˆˆβ„3) onto the megaset 𝔐(𝕄Conics) (in our illustrationβ€”onto the π‘₯𝑦 plane). In Figure 7 the point (π‘₯,𝑦,𝑧) is simply projected onto (π‘₯,𝑦,0). What are the chances that the footpoint of the projection corresponds to a β€œboundary’’ object in 𝕄βˆͺβˆͺ𝕄𝐿𝐿 (i.e., to a secondary object)? Clearly, only the points of the π‘₯𝑧 plane {𝑦=0} are projected onto the line {𝑦=𝑧=0}. If the point (π‘₯,𝑦,𝑧)βˆˆβ„3 is selected randomly with an absolutely continuous distribution, then a point on the π‘₯𝑧 plane would be chosen with probability zero. This fact illustrates the sufficiency of the model collection of nondegenerate conics (even the sufficiency of ellipses and hyperbolas alone, without parabolas); the best-fitting object will be a primary object with probability one.

But what if our model collection consists of ellipses only, without hyperbolas? Then in our illustration, the corresponding megaset would be the positive half-plane 𝐻+={𝑦>0,𝑧=0}. Finding the best-fitting ellipse would correspond to an orthogonal projection of the given point (π‘₯,𝑦,𝑧)βˆˆβ„3 onto the positive half-plane 𝐻+={𝑦>0,𝑧=0}. Now if the given point (π‘₯,𝑦,𝑧) has a positive 𝑦-coordinate, then it is projected onto (π‘₯,𝑦,0), as before, and we get the desired best-fitting ellipse. But if it has a negative 𝑦-coordinate, then it is projected onto (π‘₯,0,0), which is on the boundary of the half-plane, so we get a boundary footpoint; that is, a secondary object will be the best fit. We see that all the points (π‘₯,𝑦,𝑧) with 𝑦<0 (making a whole half-space!) are projected onto the boundary line, hence for all those points the best-fitting ellipse would not exist! This fact clearly illustrates the deficiency of the model collection of ellipses.

One may wonder: How is it possible that the collection of circles is sufficient (as we proved in Section 5), while the larger collection of ellipses is not? Indeed every circle is an ellipse, hence the collection of ellipses contains all the circles. So why the sufficiency of circles does not guarantee the sufficiency of the bigger, inclusive collection of ellipses? Well, this seemingly counterintuitive fact can be illustrated too (see Figure 8).

Suppose that the megaset 𝔐(𝕄𝐢) for the collection of circles is represented by the set π‘ˆ={𝑦=π‘₯2,π‘₯β‰ 0,𝑧=0} in our illustration. Note that π‘ˆ consists of two curves (branches of a parabola on the π‘₯𝑦 plane), both lie in the half-plane 𝐻+={𝑦>0,𝑧=0} that corresponds to the collection of ellipses. So the required inclusion π‘ˆβŠ‚π»+ does take place. The two curves making π‘ˆ terminate at the point (0,0,0), which does not belong to π‘ˆ, so it plays the role of the boundary of π‘ˆ.

Now suppose a randomly selected point (π‘₯,𝑦,𝑧)βˆˆβ„3 is to be projected onto the set π‘ˆ. What are the chances that its projection will end up on the boundary of π‘ˆ, that is, at the origin (0,0,0)? It is not hard to see (and prove by elementary geometry) that only points on the 𝑦𝑧 plane may be projected onto the origin (0,0,0) (and not even all of them; points with large positive 𝑦-coordinates would be projected onto some interior points of π‘ˆ). So the chance that the footpoint of the projection ends up at the boundary of π‘ˆ is zero. This illustrates the sufficiency of the smaller model collection of circles, despite the deficiency of the larger model collection of ellipses (which we have seen in Figure 7).

Appendix

Here we prove the theorem on no local minima for 𝑛=5 (Section 6).

Proof. Suppose that 𝑆 is a conic on which β„± takes a local minimum. We will assume that 𝑆 is nondegenerate; that is, 𝑆 is an ellipse, or hyperbola, or parabola. Degenerate conics are easier to handle, so we omit them.
Let 𝑃1,𝑃2,𝑃3,𝑃4,𝑃5 denote the given points and 𝑄1, 𝑄2, 𝑄3, 𝑄4, and 𝑄5 their projections onto 𝑆. Since 𝑆 does not interpolate all the given points simultaneously, there exists at least one which is not on 𝑆. Suppose that it is 𝑃1, hence 𝑃1≠𝑄1.
Let us first assume that 𝑄1 is different from the other projections 𝑄2,𝑄3,𝑄4,and𝑄5. Then let π‘„ξ…ž1 be a point near 𝑄1 such that dist(π‘„ξ…ž1,𝑃1)<dist(𝑄1,𝑃1). In other words, we perturb 𝑄1 slightly by moving it in the direction of 𝑃1. Then there exists another non-degenerate conic 𝑆′ that interpolates the points π‘„ξ…ž1,𝑄2,𝑄3,𝑄4,and𝑄5. It is easy to see that 𝑃dist1,π‘†ξ…žξ€Έξ€·π‘ƒβ‰€dist1,π‘„ξ…ž1𝑃<dist1,𝑄1𝑃=dist1ξ€Έ,𝑃,𝑆dist𝑖,π‘†ξ…žξ€Έξ€·π‘ƒβ‰€dist𝑖,𝑄𝑖𝑃=dist𝑖,𝑆forevery𝑖=2,…,5.(A.1) As a result, β„±ξ€·π‘†ξ…žξ€Έ=5𝑖=1𝑃dist𝑖,π‘†ξ…žξ€Έξ€»2<5𝑖=1𝑃dist𝑖,𝑆2=β„±(𝑆),(A.2) as desired. Since π‘„ξ…ž1 can be placed arbitrarily close to 𝑄1, the new conic 𝑆′ can be made arbitrarily close to 𝑆.
Now suppose that the projection point 𝑄1 coincides with some other projection point(s). In that case there are at most four distinct projection points. We will not move the point 𝑄1, but we will rotate (slightly) the tangent line to the conic 𝑆 at the point 𝑄1. Denote that tangent line by 𝑇. Note that 𝑇 is perpendicular to the line passing through 𝑃1 and 𝑄1.
Now let π‘‡ξ…ž be another line passing through the point 𝑄1 and making an arbitrarily small angle with 𝑇. It is not hard to show, by elementary geometry, that there exists another nondegenerate conic π‘†ξ…ž passing though the same (≀4) projection points and having the tangent line π‘‡ξ…ž at 𝑄1.
Since the tangent line π‘‡ξ…ž to the conic π‘†ξ…ž at the point 𝑄1 is not orthogonal to the line 𝑃1𝑄1, we easily have 𝑃dist1,π‘†ξ…žξ€Έξ€·π‘ƒ<dist1,𝑄1𝑃=dist1ξ€Έ,𝑆.(A.3) At the same time we have 𝑃dist𝑖,π‘†ξ…žξ€Έξ€·π‘ƒβ‰€dist𝑖,𝑄𝑖𝑃=dist𝑖,𝑆forevery𝑖=2,…,5.(A.4) As a result, consider β„±ξ€·π‘†ξ…žξ€Έ=5𝑖=1𝑃dist𝑖,π‘†ξ…žξ€Έξ€»2<5𝑖=1𝑃dist𝑖,𝑆2=β„±(𝑆),(A.5) as desired. Since the line 𝑇′ can be selected arbitrarily close to 𝑇, the new conic 𝑆′ will be arbitrarily close to 𝑆.

Acknowledgment

N. Chernov was partially supported by National Science Foundation, Grant DMS-0969187.