After much tinkering and even more frustration, I recently concluded my personal tour of Hierarchical Temporal Memory. I report on what I found, and I hope others find this useful or at least interesting.
First, a couple of preliminary notes:
Ben Goertzel also has a great review on HTM. Simply googling “Ben Goertzel on intelligence” should bring up his review. It’s from some years back, but still relevant – Jeff Hawkins’ book On Intelligence came out in 2004, I believe.
My work with HTM was cut short by an IP scare that recently cropped up among developers, namely because Numenta has changed their stance on experimentation with the algorithms and so on. It is extremely unfortunate that they’ve taken such a strict proprietary route. Much more progress could be made in the context of an open source development process. You can’t claim to be starting a new paradigm and then completely lock down that very paradigm. Revolutions in technology don’t develop in the vacuum of proprietary-land. That’s all I’ll say on that.
And now for a real review. In case you’ve heard about HTM, or you’ve been tempted to try out an implementation, I’d say it’s not worth getting too involved in. Here’s why.
1. Legal concerns, which clearly follow from the above mentioning of Numenta’s shift in policy.
2. The algorithms are rather computationally expensive. Numenta’s whitepaper describes a couple of shortcuts to take, which provide a little relief, but generally there’s still a lot of iterating through huge lists (vectors) of data. It’s still better than calculating all manner of ridiculous statistics functions just to get the state of one neuron, but the tradeoff is minute. After all, one can simply accept a certain level of accuracy and just use lookup tables for more traditional neural networks, circumventing the problem of calculating exponentials, square roots, and so on. With Numenta’s algorithms, because they are already operating on such a low level (binary activation values, logical OR functions on distal dendrites, etcetera), there isn’t a great deal of optimization opportunity available.
3. You don’t get a lot of mathematical backing for HTM. In fact, you get none at all. There are some basic results you can check – probability of a certain set of neurons being active at a given time, for example – but these don’t open up to much additional analysis. The underlying mechanics of HTM are not particularly amenable to methods from optimization, something which it desperately needs. The theory of sparse distributed representations is nice and all, but losing touch with the mathematics of the problem is simply a bad move. And with their decision to go completely proprietary, I for one don’t know of any mathematicians who are going to want to fill in the gaps in HTM theory specifically. It’s simply not worth the time.
4. It’s not amenable to parallel processing. These very words came from Hawkins himself, who was talking about how a researcher tried the framework on a GPU, but it offered little or no benefit. For me, that’s a red flag. If an algorithm doesn’t parallelize well, it doesn’t scale well. If it doesn’t scale well, it’s not for the 21st century. When you’re talking about artificial intelligence especially, parallel performance is top priority. A proof of concept on a desktop with a dual core should be nothing short of a marvel on a Blue Gene, if your algorithm is all it’s talked up to be.
Ray Kurzweil’s upcoming book, How to Create a Mind, is said to build on the more general ideas behind HTM and expand them greatly. These expansions would be welcome, especially since Kurzweil is known to have a keen eye for detail, and those very details are needed in the case of HTM. Kurzweil’s improvements may be just what HTM needs.
As an aside: I promised some C++ code, and it’s still coming.
EDIT 03/13/13: I’ve decided not to release the implementation I had going, due to the exact concerns mentioned. Sorry!