Monday, May 13, 2013

Things to Remember about That Thing You’re Trying to Create

(My view of the creative process)
  1. The perfect is the enemy of the good.
  2. For a work to be good you have to start it.
  3. Everything has parts. To create a whole you must identify its parts.
  4. Write down the parts, even if you don't know about them.
  5. If you don't know about a part, research it. Then revise that part.
  6. Parts have parts. Go back to 4.
  7. No more parts? Are you sure? Go back to 4.
  8. Why this work? That's a part, go back to 4.
  9. Who else did work like this? That's a part, go back to 4.
  10. Does your work "work"? How do you prove it? That's a part, go back to 4.
  11. Pick the next part and start making it.
  12. If you hate the current part go back to 11.
  13. You hate all the parts? Too bad. Finish.
  14. Go back to 11 until you're done with all the parts.
  15. Ok, now you can make it good.
  16. You can't skip to 15. Go back to 11.
  17. All done? Make it awesome. Make it tell a story.
  18. You think it's awesome? No. It sucks. Show it to someone else.
  19. Did you listen to them? Good. Fix it. Go back to 18.
  20. Do you hate it yet? Yes? It's ready.

Tuesday, February 19, 2013

Research Retrospective

I've been quiet for a while on this blog simply because of work and other things. I tried to keep up with a couple online courses, like the Data Analysis course by Jeff Leek, and I've tried to keep an eye on John Langford & Yann LeCun's class, but Suresh's Computational Geometry course has kept me as busy as I'd like while working on research.

Research-wise, I've been prepping our submission for ICML Cycle III. Now that that's over I have some time to worry about other things, like applying and interviewing for internships.

With the breathing room, I got thinking about something that I enjoy doing now that I found tedious at the beginning of my research career. So here's a little retrospective from my 4th year, looking back at my first:

1st year: No clue how to form a research question
4th year: Able to weed out most of the bad questions myself

I started out thinking about research questions just as "stuff that I thought would be neat if it were true." This isn't the wording I'd have used at the time, but I've developed more experience about how to formulate a research question. Thoughts like

  • "would anyone give a crap?"
  • "someone surely thought of this before -- yep, they sure did," 
  • "OK, so someone thought of this, did they think about this aspect?"
  • "so they didn't think of that, how trivial is it for me to test/find out?"
are all thoughts that I wasn't able to form myself a few years ago. I've still got a ways to go on this front, but I'm able to ignore my own dumb questions. 

1st year: Tracking down a body of work is tedious but necessary
4th year: Tracking down a body of work is rewarding and enjoyable

By "tracking down a body of work" I mean taking the research question that you've vetted and finding all the prior relevant work, digging through citations, doing web searches, etc. I was quite surprised to realize yesterday that I enjoy it. When I first really started research I would avoid papers in favor of presentations (I know, I know). Then I started favoring the papers (reluctantly) and would plod through them.

Perhaps because I now know how to read papers, I can see the tendrils going through the papers and see how the authors, the conferences/pubs, and the work link together in time and space, like those slow-motion videos of a lightning bolt feeling its way to the ground. It sounds like I read too much sci-fi, but if you get it, you know what I mean. The endeavor is much more exciting and rewarding now. A researcher could only come to that point with experience.

1st year: Analyzing a body of work is rewarding and enjoyable
4th year: Analyzing a body of work is rewarding and enjoyable -- but I do it faster now

Analysis, and by that I mean poring over the work and figuring out what's going on, has been my strong suit. I was never a "hacker" in the sense of throwing code together and seeing if it'll work. I was always the kind of coder that figured out what needed to be done, nearly completely, and coded it up.

I think this mostly came from my background of backporting software and writing cross-language code. Both of those activities require you to know every line of code that you work with and what those lines are supposed to do. That, and probably my bachelor's in math (which also helps with theory CS and ML), equipped me to really understand what I was reading, given enough time. And I always loved it. I still do.

The difference now, as I mentioned before, is that I know how to read papers better.  So that process happens a lot faster. It takes me many more papers to become fatigued, which means that I understand the underlying context better.

1st year: Didn't know the process
4th year: Know the process

The rest of research is pretty much just practice (although I have a nagging feeling that I missed something). Any student can work on a problem, it's just a matter of how much padding you need. It's pretty clear in retrospect that a great deal of what your advisor does is to check the padding and take it off when you're ready. Writing, submitting, checking, revising, communicating, speaking, collaborating, all that good stuff comes with practice.

There's topics that I'm omitting on purpose, like grant writing, because I doubt that I'll ever have a lot of exposure to them. But feel free to talk about your experiences in the comments.

Monday, January 7, 2013

Large Scale Machine Learning Class

+Yann LeCun and John Langford are teaching a Large Scale Machine Learning class this semester at NYU:

Yann LeCun and I are coteaching a class on Large Scale Machine Learning starting late January at NYU. This class will cover many tricks to get machine learning working well on datasets with many features, examples, and classes, along with several elements of deep learning and support systems enabling the previous.

John says that this isn't a MOOC, but that the notes and lectures are going to be made available online:

We plan to videotape lectures and put them (as well as slides) online, but this is not a MOOC in the sense of online grading and class certificates. I’d prefer that it was, but there are two obstacles: NYU is still figuring out what to do as a University here, and this is not a class that has ever been taught before. Turning previous tutorials and class fragments into coherent subject matter for the 50 students we can support at NYU will be pretty challenging as is. My preference, however, is to enable external participation where it’s easily possible. 
Suggestions or thoughts on the class are welcome :-)