Saturday, December 22, 2012

Data Courses on Coursera

The desire to gain expertise in big data doesn't show any signs of going away, nor is there any slow in the flow of training on the topic: there's three courses on Coursera all starting at about the same time:

They all start next month, January 2nd, 7th, and 22nd respectively. The first looks like an intro course covering basic R programming and analysis basics, and lasts 4 weeks. The other two look like more in-depth analysis training and last 10 and 8 weeks.

Is anyone planning to take any of these courses? Which one?

Monday, December 17, 2012

Adam Coates: Demystifying UFL

I thought this was a really great talk by Adam Coates. I really like the interpretation of finding features as just another form of clustering.


H/T Derk Mus (G+ post).

Pop Quiz Answer, and Don't Trust the Program

Thursday I wrote a "pop quiz" post about a simple optimization that appears to be convex but isn't. I like this example because it's a clear example of why you shouldn't just trust your encoding of the program.

So in review, the problem is:

Given a set of points $x_i$, find a linear discriminator $w$ and a bias $b$ so that all of the points $x_i$ are on the "positive" side of the hyperplane described by $w$ and $b$. Easy enough, but also make sure that $b$ is as large as possible (in the signed sense). In other words,
and the program is:
$$\max b \quad\textrm{s.t. }\|w\|^2 \leq 1\textrm{ and }w'x_i \geq b.$$
So let's break down the program. Basically it's a dualization of the original problem. We're treating hyperplanes as points and points as linear operators. Let's assume that the $x_i$ are two-dimensional, so the $w$ and $b$ space is three-dimensional. Let's stand on the $b=0$ plane and let $b$ be our up-down direction.

The $\|w\|^2 \leq 1$ constraint basically tells us that the solutions are in a cylinder around the origin that extends up and down infinitely. Let's also assume that the cylinder is made of an easily-cuttable material like bamboo or cheese (bamboo will probably hold its shape better).

Let's start with one point, $x_1$. The $w'x_1 \geq b$ constraint basically says, "turn to look at the point through the bamboo and cut the stalk with an upward motion, keeping everything below." What you're left with is a shape much like a hypodermic needle. The program says to find the highest $b$, so the point of the whittled stalk will be the point we care about. The corresponding $w$ points in the direction of $x_1$.

Every point we add means another bit at the top that we lop off. Also, farther points will yield steeper cuts while closer points will yield shallower cuts. A shallow cut beats a steep cut (because it will cut into the part left by the steep cut), so closer points to the origin will be more likely to have a hyperplane up against them. This makes sense.

If you do this for all the points, then the highest point left on the stalk will correspond to the hyperplane solution. Easy, especially when all your points are on one side of the stalk. Now what if your points surround the stalk? Then your stalk looks more like a pencil and less like a hypodermic needle.

What does the pencil point look like though? One important observation to make is that all of the cuts go through the origin. That is, $\textbf{0}'x_i \geq 0$ is true for every $x_i$. That means that the pencil point has to be at the origin when the points surround the origin. So the solution in that case is the hyperplane $w = 0$ and $b=0$. The only problem is that that isn't a hyperplane. It's a degenerate solution.

The parameter space has a hole drilled through it, along the line $w = 0$. This makes that nice, convex cylinder non-convex. If all of your points are on one side of a hyperplane going through the origin, then you're fine; you can cut the parameter space on the other side away, leaving a convex search space. The solution is guaranteed to lie on the $\|w\|^2 = 1$ surface too, so you don't have to play any games with scaling.

If the points don't lie on one side of the origin though, then you simply don't have a convex program anymore, despite your steadfast insistence that you do. Take-home lesson: make sure you know where all the holes in your space are. Convex domains don't like holes.

Thursday, December 13, 2012

Pop Quiz

Yesterday I was chatting with Professor 1 about some research when Professor 2 came in very excitedly and described a problem that he wanted to solve with P1's help. I'm simplifying this a bit, but the problem is the same:

Given a set of points $x_i$, find a linear discriminator $w$ and a bias $b$ so that all of the points $x_i$ are on the "positive" side of the hyperplane described by $w$ and $b$. Easy enough, but also make sure that $b$ is as large as possible (in the signed sense). In other words,

$$\max b \quad\textrm{s.t. }\|w\|^2 \leq 1\textrm{ and }w'x_i \geq b.$$

P2 and I were convinced by the formulation that this is a convex program and should be straightforward to solve.

P1 says nope. Not a convex problem in general. P1 is correct. What's the exception? And why isn't the problem convex in this case?

P.S. P1, you are not allowed to spoil the solution!

Testing MathJax

$$ \log\sum_{i=1}^n\exp(f(x_i)) $$ \[ \log\sum_{i=1}^n\exp(f(x_i)) \] And inline: \( \log\sum_{i=1}^n\exp(f(x_i)) \) and $ \log\sum_{i=1}^n\exp(f(x_i)) $

Wednesday, December 12, 2012

Book: Category Theory by Steve Awodey

Programming Language researchers already know a bunch of basic category theory, but the rest of us Computer Scientists tend to be out of the loop when it comes to this basic language of math. I've been reticent to take any more math courses because of my near-complete lack of knowledge of the topic.

A colleague of mine recently introduced me to Category Theory by Steve Awodey (at CMU). A computer scientist like myself, he was frustrated with Categories for the Working Mathematician by Saunders Mac Lane (I also found CftWM to be impenetrable), and found Category Theory to be a much more digestible reference. So far it looks pretty good and is very readable. It's aimed at non-mathematicians, which certainly helps.

Has anyone else read this book and found it useful?

Tuesday, December 11, 2012

The Big NIPS Post

2012 was my first year at NIPS, and I'm still getting my legs when it comes to ML, so my impressions are very general. For other takes on NIPS, see Hal Daume's post (a veteran NIPS-goer), and my advisor Suresh Venkatasubramanian's post (a newbie to NIPS like myself).

Monday

Monday is tutorial day at NIPS, and I really only got something that I wanted out of one tutorial, by Joel Tropp. Joel did a great presentation on random matrices and distilled the very confusing body of work out there to a nice, clean idea. Basically, break random matrices down into a sum of simpler, independent random matrices, then apply concentration bounds on the sum. He worked through several examples and the ideas were easy to understand.

Tuesday

Scott Aaronson's talk was great, not surprisingly. Scott talked about "Quantum Information and the Brain," such as what QC's will be good at, and whether the human brain is or isn't probably a quantum computer. Scott thinks that the most likely place that QC's will be employed, when we get them working at a large enough scale to be useful, is in simulating QM, and probably not as much in other applications. He underscored the fact that most people have a fundamental misunderstanding of how QC's work, and that most QC theorists think that NP is not a subset of BQP (the complexity class that covers quantum algorithms). 

Scott doesn't believe that the human brain is a QC, and gave one of the most compelling arguments in my mind that it isn't. That is, we know what QC's are good at: factoring integers, discrete logarithms, and simulating QM. There isn't any evidence to date that humans are any good at any of these things. He says that the idea that the brain might be a QC is usually the product of thinking, "brain is mysterious, QM is mysterious, therefore brain = QC." 

I tend to agree with him, and it seems that the human brain is an exceptionally good pattern recognizer and identity factory (read "self-deluder"), but not really a great general-purpose computer, especially a quantum one.

Wednesday & Thursday

I'm going to freely admit that I didn't get much out of these days. As I get more accustomed to the material, I'll probably have a lot more to explore. There were a lot of posters to look at, and it was cool to see some kernel methods, metric learning, even Riemannian techniques.

Friday

Friday and Saturday are the workshop days, and I felt like I got a lot out of these. I visited the two social network workshops, and there was a lot of fun stuff to look at. The invited speakers were all great, and it was interesting to see more algorithms-related stuff going on along with ML.

Saturday

Saturday was all about the optimization workshop for me. Pablo Parillo and Francis Bach both had great talks, and this is where I presented our work (arXiv) and poster.  We got a lot of great feedback from the audience and a lot of directions to follow.

General Impressions

Primary impression: NIPS is exhausting! There's tons of stuff to look at and do. This year was definitely all about deep learning, so I'm going to make sure I know the basics before next year. 

Venue: They can definitely do better than S. Lake Tahoe CA/Stateline NV. The conference was at Harvey's and Harrah's on the NV side, and you're basically stuck at these casinos and the surrounding blocks if you don't want to fork out a bunch for a cab. 

There's not a whole lot of options around for food, but definitely go try Super Taco about a block from Harvey's. Great tortas and a really, really nice change from the casino options -- the owner is a great guy too (only outdoor seating, but a quick walk back to Harvey's). Don't go to the Cabo Wabo in Harvey's. It's a very pathetic tribute to Sammy Hagar and has overpriced, mediocre-to-crappy food. If you're into the crappy-coffee-steak-and-eggs thing, there's a diner in the basement of Harrah's underneath the conference floor.

I didn't ski, so you'll have to ask someone else about that. I was surprised that anyone was skiing on the rocks and ice I saw on the mountains, but I'm from Utah, so I guess I have a higher standard. 

Biggest gripe about the venue: you're stuck. It's really not worth it to rent a car, because you'd have to chain your tires to get over the mountains, and I have no idea what options Reno is going to have for rentals. It's not a big city. So you have to take a shuttle to get between Reno and Tahoe, and that takes as long or longer than the leg between Salt Lake and Reno. You also have to make that extra reservation, and account for about 2 extra hours to get to the airport.

Facilities: Nice rooms, but jeez, gimme more outlets! My con roomie and I were stuck in weird spots in the room to get power. Bring an extension cord or power strip. They had an old buffet room converted into the "Internet Cafe." The wifi reception was the best in this area and they had tables with lots of power strips, which was nice. This was the best alternative to getting reception in the hotel room. We stayed at Harvey's, and the beds, chairs, bathroom, and shower were all good. Comfy place to crash. HD televisions but no HD cable.

Like Hal said in his post, the walls were like paper. This made it really hard to listen to a quiet invited speaker while there was conversation in the hallways or a loud speaker in the next room. Also not a lot of ventilation for crowded rooms.

Conclusion

Overall, I definitely wouldn't mind coming back to NIPS, but I look forward to when they move the venue.  I also don't know if I have the energy for the whole conference. It's big!

Monday, December 10, 2012

Blog Intro

Hi, everyone. This is a new blog, and as of right now I plan to write about topics in machine learning and algorithms having to do with kernels and related technologies (MKL for example, or other technologies that touch on kernels, like representation learning).  I certainly don't know everything about these technologies, and as I learn about them I'll try to write a post here.  This will mostly be a technical blog, but I may post every now and again on topics that relate to the job market, to science policy, or other loosely-related items.

Now that that's out of the way, I want to write a quick introduction about myself. I'm currently a PhD student in computing at the University of Utah. My advisor, Suresh Venkatasubramanian, is part of the Data Group at the School of Computing, where we meet weekly to talk about big data, databases, algorithms, high-dimensional geometry, machine learning, and other topics related to data. My interests were initially in geometry and algorithms, but they've slowly moved over to machine learning and optimization. Because of my training though, I'm still interested in geometric interpretations of machine learning.

That's the dry version. Really, I love matrices and matrix math, so I like machine learning and optimization (arXiv version) a lot. When I read about a graph, I'm always a lot more interested in the adjacency matrix or the Laplacian or its spectrum than I am in the combinatorics. When I was working on the space of positive definite matrices, I was way more interested in the algebraic structure than I was in the points.  I like computing Lagrangians for some weird reason and I like hearing about neat matrix tricks, especially if they're easy to understand.

Subscribe to the feed, I'll try to keep you interested in the new stuff I learn about.