Pop quiz, gentle readers. What do the following books have in common?

Getting started in value investing, by Mizrahi, Introduction to Java bu Wu,
and The LaTeX Companion, by Goosens, MIttelbach, and

Answer: the fact that I am the biggest dork evah.

I know that blogging has been a little light (read: non-existent) recently, but I've been absorbed by a little side project that's almost done. To tell the story through those three books, top to bottom:

Getting started in value investing, by Charles S. Mizrahi

So, part of turning thirty-five was finally accepting that I was not going to found the next Microsoft, and that my best chance at untold riches was through good, old-fashioned investing. A get-rich-slowly scheme, if you will. And after a thorough research and literature survey (consisting mostly of "Hey, dad!") I've settled on something called 'value investing'. I'll leave the gory details of this method for another post, if there's interest, but the point here is that it involves a lot of research. For each company, you need to:

  1. Go get ten year's worth of balance sheets, income statements, and cash-flow statements. Best case scenario: this is scattered across at least six different web pages.
  2. Pull out some key items, and perform a few computations (ratios, growth rates, etc.). Use these to perform some litmus tests.
  3. If the company survives these tests, read the company's annual reports for the past few years. Try to get a sense of their business and their management.
  4. If you like the business and think that they have a 'durable competitive advantage,' then figure out how much their stock is objectively worth. That is: Try to estimate how a rational stock market would price their shares in five to ten years, mostly by extrapolating out from the present. Then, discount that price back to the present, using some discount rate of your choice.
  5. Quietly gather cash while you wait for the stock market to go crazy and price the company well below that objective value.
  6. Buy shares in that company.
  7. Wait for everyone to come back to their senses.
  8. Profit!

Most of this is harder and more nebulous than I make it sound, of course, and I'm sure there's lots of good debate over the details. But my point here is that steps 1 and 2 above are really, really monotonous. There's really nothing interesting going on here, just purely mechanical information processing. And you have to do this a lot: one can easily go though a hundred or so companies before you find one that passes the tests.

The curse of being a computer-type person is that you have absolutely no patience for this kind of rote work. The blessing of being a computer-type person, on the other hand, is that you never need to put up with this kind of drudgery ever again. Which leads us to...

An Introduction To Object-Oriented Programming With Java, by C. Thomas Wu

Speaking as a former computer-science professor, let me be the first to admit: 'Introduction to Programming' classes never explicitly teach you how to do anything useful. Any useful program will (by definition, I assert) interact with the real world. The real world is messy, and we teachers have very good reasons for avoiding such mess in Programming 101.

But the reality is that one semester of programming is truly all it takes. You might think you only know the basics, and that's not false. But the little secret no one bothered to tell you is this: all the hard stuff has already been done for you. You just need to poke around to find what you need.

Take steps 1 and 2 from that list up there. What would it take to automate that? Given the ticker for a company, you would need to:

  1. Construct the URLs for the web pages you need,
  2. Open those URLs and get their HTML contents,
  3. Scan that HTML for text matching a given pattern,
  4. Munge that text into numbers,
  5. Use those numbers for some very simple calculations, and
  6. Make a test: are your final values 'good enough'?

Of this list, only two steps are not 'Programming 101' easy: steps two and three (connecting to a URL and pattern-matching against text). And because these two steps are hard to code, the code you need was already been written and available to you. Mere generally: if your programming language is halfway appropriate for the problem at hand, whatever it is, then the really tricky parts of your program will come pre-solved.1 All you need to do is invoke the solutions.

...Which, obviously, is how I wrote a little Java program to 'research' and test companies for me. And once you get that far, it's pretty simple to have the program go one step further: take an on-line list of tickers (found by opening and pattern-matching a URL, which we already know how to do) and 'research' each of them. End result? 2000+ companies scanned, 80 or so kept. Later winnowed by hand down to 67. More than enough to get started.

But upon doing so, I almost immediately ran into a problem: I needed to do more than I could do by looking at a computer screen. I wanted a checklist to help me complete all the research I needed to do on a given company. I wanted to be able to take notes on the financial data so that I could remember to find answers in the annual reports. I wanted to record why I decided to keep or reject a particular company. And I wanted to be able to come back to those reasons next year (along with the data I had available right now) to see how well I did. In short, I wanted to print this information out so that I could write on it and store it. This leads to...

The LaTeX Companion, by Goossens, Mittelback and Samarin

No, this has nothing to do with the polymer, you pervert. LaTeX is a typesetting language originally designed for mathematical papers. I've been using it for years. And when I need to write anything longer or more technical than a typical email, it's what I reach for first.

It also has an advantage over things like Microsoft Word in that a LaTeX file is very much like HTML plain text. You can read it. You can type it up from scratch. And it is easy to make a Java program produce LaTeX through Programming-101-level string-manipulation. Knowing what LaTeX to produce is another story, unfortunately, and requires a lot of experience. But assuming you do know what LaTeX code to produce, it's easy to make Java write it for you. And the payoff? Being able to make sixty-odd worksheets like this one in under five minutes (and knowing that each and every one of them represents a company that my be worth the time to research). Isn't it pretty? And those sparklines!2 I can stare at them all day.3

So where is this all going? Nowhere, really. I don't have a real point to this post, like I usually do. Instead, let me end with some observations:

  • As I mentioned above, a lot of programming is just using Programming 101 techniques to hook together these much larger bits of code that come for free with the language (called 'libraries'). This isn't always true, of course. Someone has to make those libraries, for example. Also, I've been on teams doing heavy-duty algorithm design. When you're the first people to try to solve a given problem, there may not be much in the way of pre-existing code to use. But I still assert that most day-to-day programming comes down to intelligently stringing together library-calls.
  • On a related note: because so much programming is library re-use, the quality and variety of these libraries matters tremendously. As much as I admire OCaml, for example, the paucity of its standard library pretty much rules out widespread acceptance. Also, this is one of those places where there is a big difference between adequate design and superb design. An okay language with superb libraries beats a superb language with okay libraries.
  • Sparklines are great for quickly getting the 'shape' of data and making quick comparisons / correlations. (Take that Netflix PDF, for example. Sales are going up! Income is going up! Operating cash is going up! Why is equity going down?) But sparklines only tell you shape, not magnitude. That 'sales' sparkline only tells you that sales are going up, roughly linearly. But from what to what? From 20 to 40? 200 to 4000? 2 to 4? For that, you need to look at the numbers as well as the sparkline. (Or I could put more thought into my sparkline-generating code... but I'd still need to read the numbers anyway.)
  • No program is ever done. Right now, my program does everything I describe above. But that's a lot more than I wanted when I started writing, and it's a lot less than I want now. Now that I've got this much working, I'm beginning to think about graphical interfaces, maybe a database to record the companies I've already scanned and what decision I made, perhaps trying to find a better way of getting the data4 and so on. No program is ever 'finished,' just finished enough to use or ship. This is the core problem of software engineering, and why it is so hard: how do you write code so that (1) it can be extended in ways you can't imagine now by people who you may never meet, while (2) getting done what you need to get done now? We don't actually have good answers to that question yet,5 which is why Your Favorite Program still doesn't do that one thing you've really wanted it to do for years now...

Not that I followed my own advice, mind you. True, Java did just fine for this domain. But given that most of my program was text-processing, I probably 'should' have learned Perl. Without getting into a religious war, however, if the option is Perl then I'll just stick with Java thankyouverymuch.

  1. There is some important context here I am eliding over. There is no one perfect programming language which is suitable for every task under the sun. Instead, every (non-research) language has one or more 'sweet spots' where it excels. Furthermore, it's usually pretty easy to determine the sweet spot of a language by skimming its standard library. The key to being a programmer, as I tried to tell my students, was to truly learn a handful of basic programming paradigms. Then, when you decide that a program need to be written, go find a language with that program in its sweet spot. It's much easier to learn the idiosyncrasies of a new but appropriate language then to wrestle a familiar language into an inappropriate domain. 

  2. I probably should have included a Tufte book in that stack, come to think of it, but I left all my copies at work. 

  3. Which is good. The next step of this is to read all these print-outs. I'll be staring at sparklines a lot in the coming weeks... 

  4. Any suggestions on this are welcome, by the way. Currently, I'm pattern matching against HTML served by financial websites. This seems... fragile. Does anyone know of a site that will serve XML- or JSON-formatted financial data? 

  5. Though we do know some good ways to prevent this kind of extension. See my previous footnote about Java vs. Perl.