Monday, December 17, 2012

Pop Quiz Answer, and Don't Trust the Program

Thursday I wrote a "pop quiz" post about a simple optimization that appears to be convex but isn't. I like this example because it's a clear example of why you shouldn't just trust your encoding of the program.

So in review, the problem is:

Given a set of points $x_i$, find a linear discriminator $w$ and a bias $b$ so that all of the points $x_i$ are on the "positive" side of the hyperplane described by $w$ and $b$. Easy enough, but also make sure that $b$ is as large as possible (in the signed sense). In other words,
and the program is:
$$\max b \quad\textrm{s.t. }\|w\|^2 \leq 1\textrm{ and }w'x_i \geq b.$$
So let's break down the program. Basically it's a dualization of the original problem. We're treating hyperplanes as points and points as linear operators. Let's assume that the $x_i$ are two-dimensional, so the $w$ and $b$ space is three-dimensional. Let's stand on the $b=0$ plane and let $b$ be our up-down direction.

The $\|w\|^2 \leq 1$ constraint basically tells us that the solutions are in a cylinder around the origin that extends up and down infinitely. Let's also assume that the cylinder is made of an easily-cuttable material like bamboo or cheese (bamboo will probably hold its shape better).

Let's start with one point, $x_1$. The $w'x_1 \geq b$ constraint basically says, "turn to look at the point through the bamboo and cut the stalk with an upward motion, keeping everything below." What you're left with is a shape much like a hypodermic needle. The program says to find the highest $b$, so the point of the whittled stalk will be the point we care about. The corresponding $w$ points in the direction of $x_1$.

Every point we add means another bit at the top that we lop off. Also, farther points will yield steeper cuts while closer points will yield shallower cuts. A shallow cut beats a steep cut (because it will cut into the part left by the steep cut), so closer points to the origin will be more likely to have a hyperplane up against them. This makes sense.

If you do this for all the points, then the highest point left on the stalk will correspond to the hyperplane solution. Easy, especially when all your points are on one side of the stalk. Now what if your points surround the stalk? Then your stalk looks more like a pencil and less like a hypodermic needle.

What does the pencil point look like though? One important observation to make is that all of the cuts go through the origin. That is, $\textbf{0}'x_i \geq 0$ is true for every $x_i$. That means that the pencil point has to be at the origin when the points surround the origin. So the solution in that case is the hyperplane $w = 0$ and $b=0$. The only problem is that that isn't a hyperplane. It's a degenerate solution.

The parameter space has a hole drilled through it, along the line $w = 0$. This makes that nice, convex cylinder non-convex. If all of your points are on one side of a hyperplane going through the origin, then you're fine; you can cut the parameter space on the other side away, leaving a convex search space. The solution is guaranteed to lie on the $\|w\|^2 = 1$ surface too, so you don't have to play any games with scaling.

If the points don't lie on one side of the origin though, then you simply don't have a convex program anymore, despite your steadfast insistence that you do. Take-home lesson: make sure you know where all the holes in your space are. Convex domains don't like holes.