The Octet System: A Way to Think About AI

You see countless headlines about AI these days, littered with references to “deep learning”, “neural networks”, “bots”, “Q&A systems”, “virtual assistant”, and all manner of other proxy terms. What’s missing from this entire discussion is a way to gauge what each system is really capable of.

In the spirit of the Kardashev Scale, I’ve put together my own ranking system for AIs, which we’ll be using at Machine Colony.

(Note: I’ll try to provide as much background and example information as I can without this post reading like a cog-sci textbook. For those savvy in AI these examples will no doubt seem pedestrian, however I do try to illustrate concepts as much as possible, in a perhaps ill-fated effort to refrain from being too esoteric.)

Introducing the Octet System

I’m not much for fancy names, but in this case it was fitting: list the qualitative capabilities of an information system, and break it down into eight distinct ranks, or “classes”. They are as follows:

Class Null

The zeroth class is something which does not qualify as an intelligent system whatsoever. While this can cover any manner of programs – be they in software or manifested in processes emerging from hardware – I choose to focus on software for this example.

Programs that fall into class null have the following characteristics:

  • They are only able to follow explicit, predetermined / deterministic logic.
  • They follow simple rules – “if this then that” – with no capacity to ever learn anything.
  • They have no capacity to make nuanced decisions, i.e. based on probability and/or data.
  • They have no internal model of the world (this is related to learning).
  • They do not have their own agency.

This is by far the broadest class, as it covers the vast majority of our software systems today. Most of the world’s software is programmed for a specific task and does not really need to be a learning, decision-making system.

Class I

Programs of this class have the following characteristics that are different from Class Null:

  • They have the ability to make rudimentary decisions based on data, based on some trained model.
  • They have the ability to learn from the outcomes of their decisions, and thus to update their core model.
  • As such, their behavior may vary over time, as the data changes and their model changes.
  • They are trained for a small number of narrow tasks, and do not have the capability to go outside those tasks.

This covers things like fraud detection agents, decent spam detection, basic crawling bots (assuming they’re at least using decision trees or something similar). The decision could be a classification action – marking something as spam, for instance – or it could be deciding how well a website ranks in the universe of websites.

Class II

Classes I and II are the most similar on this scale, because the distinction is subtle: a Class II program has all of the capabilities of a Class I, but may have more than one core model and more than one domain of action. For instance, the new Google Translate app has natural language and vision capabilities, with different models for each. These models are linked and ‘cooperate’ to translate the words in the field of view of your smartphone’s camera.

Class Is, by contrast, only have one area that they’re focused on, and make decisions only based on the model relevant to that domain.

Class III

Programs of Class III have a two main distinctions from Class IIs:

  • They have a basic memory mechanism, and the ability to learn from their history in those memories. This is more advanced than simply referring to data; these programs are actually building up heuristics from their own behavioral patterns.
  • They persist some form of internal model of the world. This assists in creation of memories and new heuristics in their repertoire.

Class IV

This class starts to loosely resemble the intelligence level of insects. Class IVs not only have some kind of internal model of the world, but they gain an ability that was essential to the evolution of all complex life on earth: collaboration.

Thus, their characteristics are:

  • They have the ability to collaborate with other agents/programs. That is, they have mechanisms with which to become aware of other agents, and a medium through which to communicate signals. (Think of ants and bees leaving chemical traces, for instance.)
  • Like Class IIIs, they have an internal model of the world. However, a Class IV’s model is more closely linked to its goal structure, and not merely ad hoc / bound in one model. Its internal model may be distributed across several subsystems / mathematical models; representations of complex phenomena or experiences are encoded across various components in its cognitive architecture (vision systems, memory components, tactile systems, etc).
  • They have the ability to perform rudimentary planning, driven by fairly rigid heuristics but with a little flexibility for learning.
  • They have the ability to form basic concepts, schema, and prototypes.

While it is not a prerequisite that Class IVs have multiple distinct sensory modalities – optic, auditory, tactile, olfactory systems – that serves as a good example of the complexity level these programs start to achieve. In an AI setting, a program could have hundreds of different types of inputs, each with their own data type and respective subsystem for processing the input. The key distinction is that in Class IV programs, these subsystems have a high degree of connectivity, and thus generate more complex behavior.

Many robotics software systems could also be placed in Class IV.

Class V

The capabilities of Class Vs begin to resemble more complex animal behavior, such as rats (but not as intelligent as many apes). The primary characteristics of note are:

  • They have the ability to reflect on their own ‘thoughts’. In an AI program setting, this would mean it has the ability to optimize its own metaheuristics. In rats, for example, this is manifested as a basic form of metacognition.
  • They have the ability to perform complex planning, especially in which they are able to simulate the world and themselves in it. Which leads to:
  • They have the ability, to some degree, to simulate the world in their minds. That is, they can perturb their internal model of the world without actually taking action, and play out the results of hypothetical actions. They can imagine scenarios based on their knowledge of the world, which is intimately related to their memories (recall the memory capability from Class III).
  • Related to their planning and internal simulation capabilities, they have the ability to set their own goals and take steps to achieve them. For instance, a rat may see two different pieces of food, decide that it likes the looks of one of them better than the other, set its goal to acquire the better-looking morsel, and subsequently plan a path to get it. The planning part relies on actions it knows it can do – how fast it can or run, how far can it jump – and the terrain ahead of it, as well as memories of how it may have conquered that type of terrain before. Thus goal-setting and planning rely heavily on memory and the internal model.
  • They have a rudimentary awareness of their own agency in the environment. That is, when they are planning, they treat themselves as a factor in the environment they are simulating.

By this time, you have a program which is able to reflect, plan, collaborate with other agents, set goals, learn new behaviors and strategies for achieving its goals, and simulate hypothetical scenarios.

Class VI

You’re a class VI. So am I. Almost every human being is a Class VI – ‘almost’ because, well, this designation is questionable when applied to some politicians.

Programs of this class will start to resemble human-level intelligence and capability, though not necessarily human-like in nature. While in humans a major difference is more complex emotions, this scale does not consider emotions directly.

Artificial intelligence has not yet reached this level, and there are varying predictions as to when it will. The good news is that the expert consensus is clear on the idea that it will happen, it’s just that no one knows exactly when.

Key components of Class VI agents/AIs are:

  • Full self-awareness. The agent is fully aware of itself, its history, where the environment ends and it begins, and so on. This is related to consciousness, though Class VIs need not necessarily be conscious in a strict sense.
  • They have the ability to plan in the extremely long term, thinking ahead in ways that more basic systems cannot. Specific timescales are relative to its natural domain: for a person, decades; for an AI program, perhaps, seconds or hours.
  • Class VIs are able to invent new behaviors, processes, and even create other ‘programs’. In the case of a human, this is obviously an inventor creating a new way of solving a problem, or a software developer programming AIs somewhere in Brooklyn…

Class VII

This is what might well be referred to as ‘superintelligence‘. While some AI experts are skeptical of whether or not this can be achieved, there does seem to be broad agreement that it is imminent. Nick Bostrom writes elegantly about the subject in his book of the same name.

While nobody knows exactly what this may look like, there are two major distinguishing factors which would almost certainly be present:

  • They have the ability to systematically control their own evolution.
  • They have the ability to recursively improve themselves, perhaps even at alarmingly minuscule timescales.

Their first ability is perhaps their most profound. While humans do in some sense control our own fate, we do not yet have fine-grained control over the evolution of our brains and hence our cognitive abilities (though CRISPR may soon change that). With an artificial superintelligence, many limitations are removed. They can arbitrarily copy-paste themselves, ad finitum, and perform risk-free simulations of their new versions. They also will be essentially immortal, so long as their hardware persists and has a supply of energy.

With respect to the second ability, one might imagine an ASI (artificial superintelligence) making multiple clones of itself, each clone independently applying a self-improving strategy, and then each one in turn performing a set of benchmark tests to determine which one improved the most from the original copy. Whichever agent performed the best would become the new master copy, while the others would be taken out of the running.

This is a supercharged evolutionary algorithm, essentially. The tests would be agreed upon in advance, and even perhaps written as a cryptographically secure contract (blockchain-based or otherwise) to prevent cheating. In doing so, the agent would keep improving up to hardware limits or some theoretical asymptote.

The kind of scenario above is not at all unlikely in the near future.



AI capabilities currently exist somewhere between the Class IV and Class V marks, but are quickly marching toward Class VI. DeepMind and Facebook are leading the way in this direction, though other notable players are making important contributions. Certainly the brand-new OpenAI will have some interesting insights as well.

My hope is that this type of classification system, and others like it, will help bring some structure to the conversation around fast-emerging AI. With deeper clarity in our common language, we can have more meaningful and productive conversations about how we wish for this technology to advance and how it ought to be used. We owe it to ourselves to have the linguistic tools to accurately describe our progress.

A World Inside the Mind

Short post today, but a few things occurred to me as I was reading the paper on Bayesian Program Learning:

  • This form of recursive program induction starts to look suspiciously like simulation – something we do in our minds all the time.
  • Simulation may be a better framing for concept formation than via the classification route.
  • Mapping the ‘inner world’ to the ‘outer world’ seems a more sensible approach to understanding what’s going on. If you look at the paper, you also see some thought-provoking examples of new concept generation, such as the single-wheel motorbike example (in the images). This is the most exciting point of all.

A final design?

Combine all the elements together, along with ideas from my last post, and you get something that:

  1. Simulates an internal version of the world
  2. Is able to synthesize concepts and simulate the results, or literally ‘imagine’ the results – much like we do
  3. Is able to learn concepts from few examples
  4. Has memories of events in its lifetime / runtime, and can reference those events to recall the specific context of what else was happening at that time. That is, memories have deep linkage to one another.
  5. Is able to act of its own volition, i.e. in the absence of external stimulus. It may choose to kick off imagination routines – ‘dreaming’, if you will – optimize its internal connections, or do some other maintenance work in its downtime. Again, similar to how our brains do it while we sleep.

This starts to look like a pretty solid recipe for a complete cognitive architecture. Every requirement has been covered in some way or another, though in different models and in different situations. To really put the pieces together into a robust architecture will require many years of work, but it is worth exploring multi-model cognitive approaches.

If it results in a useful AI, then I’m all in.


Context & Permutations

In the pursuit of Artificial General Intelligence, one of the challenges that comes up again and again is how to deal with context.  To illustrate: telling a robot to cross the street would seem simple enough.  But consider the context that five minutes ago somebody else told this robot not to cross the street because there was some kind of construction work happening on the other side.  What does the robot decide to do?  Whose instruction does it consider more important?

A robot whose ‘brain’ did not account for context properly would naively go crossing the street as soon as you told it to, ignoring whatever had come before.  This example is simple enough, but you can easily imagine other situations in which the consequences would be catastrophic.

The difficulty in modeling context in a mathematical sense is that the state space can quickly explode, meaning that the number of ways that things can occur and sequences they can occur in is essentially infinite.  Reducing these effective infinities down to manageable size is where the magic occurs.  The holy grail in his case is to have the computing of the main algorithm remain constant (or at least linear) even as the number of possible permutations of contextual state explodes.

How is this done?  Conceptually, one needs to represent things sparsely, and have the algorithm that traverses this representation only take into account a small subset of possibilities at a time.  In practice, this means representing the state space as transitions in a large graph, and only traversing small walks through the graph at any given time.  In this space-time tradeoff, space is favored heavily.

The ability to adeptly handle context is of utmost importance for current and future AIs, especially as they take on more responsibility in our world.  I hope that AI developers can form a common set of idioms for dealing with context in intelligent systems, so that they can be collaboratively improved upon.

We’ve had it all wrong.

All this time, we’ve had it all wrong.

Artificial Intelligence (AI) has been a science for over 50 years now, and in that time has accomplished some amazing things – computers that beat human players at chess and Jeopardy, find the best routes for delivery trucks, optimize drug delivery, and many other feats.  Yet the elusive holy grail of “true AI”, or “sentient AI”, “artificial general intelligence” – by whatever name, the big problem – has remained out of our grasp.

Look at what the words actually say though – artificial intelligence.  Are we sure that intelligence is really the crucial aspect to creating a sentient machine?

I claim that we’ve had it wrong.  Think about it: intelligence is a mere mechanical form, a set of axioms that yield observations and outcomes.  Hypothesis, action, adjustment – ad infinitum.  The theory has been if we could just create the recursively self-optimizing intelligence kernel, BOOM! – instant singularity.  And we’d have our AGI to run our robots, our homes, our shipping lanes, and everything imaginable.

The problem with this picture is that it assumes intelligence is the key underlying factor.  It is not.

I claim the key factor is…

…wait for it…


Consciousness might be defined as how ‘aware’ an entity is of itself and its environment, which might be measured by how well it was able to distinguish things like where it ends and its environment begins, a sense of agency with reference to past actions it performed, and a unified experience of its surroundings that gives it a constantly evolving sense of ‘now’.  This may overlap with intelligence, but it is a different goal: looking in the mirror and thinking “that’s me” is different than being able to beat humans at chess.  A robot understanding “I broke the vase” is different than an intelligence calculating the Voronoi diagram of the pottery’s broken pieces lying on the floor.

Giulio Tononi’s work rings a note in harmony with these ideas.  Best of all, he and others discuss practically useful metrics of consciousness.  Whether Integrated Information Theory is the root of all consciousness or not is immaterial; the point is that this is solid work in a distinctly new direction, and approaches the fundamental problems of AI in a completely new way.

Tononi’s work may be a viable (if perhaps only approximate) solution to the binding problem, and in that way could be immensely useful in designing systems that have a persisting sense of their evolving environment, leading us to sentience.  It is believable that intelligence may be an emergent property of consciousness, but it seems unlikely that intelligence alone is the ingredient for consciousness itself, and that somehow a certain ‘amount’ of intelligence will yield sentience.  One necessarily takes precedence over the other.

Given this, from now on I’ll be focusing my work on Artificial Consciousness, which will differ from Artificial Intelligence namely in its goals and performance metrics: instead of how effectively an agent solved a problem, how aware it was of its position in the problem space; instead of how little error it can achieve, how little ambiguity it can achieve in understanding its own boundaries of existence (where the program ends and the OS begins, where the robot’s body ends and the environment begins).

I would urge you to read Tononi’s work and Adam Barrett’s work here.  My Information Theory Toolkit ( has several of the functions you’ll need to start experimenting on systems with a few more lines of code (namely, use Kullback-Leibler divergence).

In the coming months, I’ll be adding ways to calculate the Information Integration of abstracted systems, or its Phi value.  This is NP-Hard, so it will have to remain in the domain of small systems for now.  Nonetheless, I believe if we start designing systems with the intent of maximizing their integration, it will yield some system topologies that have more beneficial properties than our usual ‘flat’ system design.

Artificial Intelligence will no doubt continue to give us great advances in many areas, but I for one am embarking on a quest for something subtly but powerfully different: Artificial Consciousness.

Note: If you have some programming skill and would like to contribute to the Information Theory Toolkit, please fork the repository and send me an email so we can discuss possibilities.  I’ll continue to work on this as I can.

“How to Create a Mind” Review

I’ve just finished reading Ray Kurzweil’s new book, “How to Create a Mind”.  In it I found a wealth of good information, especially in the form of thought experiments.

Kurzweil’s latest work ties in a mass of data about the brain, pattern recognition, and his own experiences, creating a sort of roadmap to creating strong AI.  The account is clearly written and concepts are well explained.  He includes some interesting research, perhaps most intriguing are the experiments with split-brain patients, illuminating more subtle aspects of consciousness.

The grand theory presented in the book, the Pattern Recognition Theory of Mind, has some nice features.  It promises completely asynchronous processing to emulate the brain’s same ability, as well as uniformity of elements.  This uniformity is part of what enables arbitrary regions to be configured to process different types of information, given sufficient exposure to their respective type of data.

Kurzweil’s formulation of hierarchical pattern recognition seems to stem almost exclusively from Hidden Markov Models, specifically of the hierarchical variety.  While these models are indeed useful for many applications, missing throughout the book is an explanation of any explicit role of time.  In Jeff Hawkins’ “On Intelligence”, temporal patterns take a central role in the theories presented, distinguishing it from most traditional machine learning designs.  By contrast, Kurzweil’s PRTM (Pattern Recognition Theory of Mind) does not take time directly into account.  We’re left to assume that temporal patterns are implicit in the changing of spatial patterns, though some definite remark on that would have been helpful.

Most of the book’s real value does not come from detailed algorithmics or mathematical ingenuity, but again from the deep and illuminating thought experiments presented.  Kurzweil has a way of exposing subtle relationships in concepts that no other author can, save Marvin Minsky (another personal favorite, who was a mentor of Kurzweil’s).  The book delivers a powerfully enlightening look into the intricate world of pattern recognition, and presents fascinating a viable avenues of exploration for making intelligent machines.  Anyone who is interested in the brain, AI, robotics, or just technology in general should definitely give this a read.

HTM Official Review

After much tinkering and even more frustration, I recently concluded my personal tour of Hierarchical Temporal Memory.  I report on what I found, and I hope others find this useful or at least interesting.

First, a couple of preliminary notes:

Ben Goertzel also has a great review on HTM.  Simply googling “Ben Goertzel on intelligence” should bring up his review.  It’s from some years back, but still relevant – Jeff Hawkins’ book On Intelligence came out in 2004, I believe.

My work with HTM was cut short by an IP scare that recently cropped up among developers, namely because Numenta has changed their stance on experimentation with the algorithms and so on.  It is extremely unfortunate that they’ve taken such a strict proprietary route.  Much more progress could be made in the context of an open source development process.  You can’t claim to be starting a new paradigm and then completely lock down that very paradigm.  Revolutions in technology don’t develop in the vacuum of proprietary-land.  That’s all I’ll say on that.

And now for a real review.  In case you’ve heard about HTM, or you’ve been tempted to try out an implementation, I’d say it’s not worth getting too involved in.  Here’s why.

1. Legal concerns, which clearly follow from the above mentioning of Numenta’s shift in policy.

2. The algorithms are rather computationally expensive.  Numenta’s whitepaper describes a couple of shortcuts to take, which provide a little relief, but generally there’s still a lot of iterating through huge lists (vectors) of data.  It’s still better than calculating all manner of ridiculous statistics functions just to get the state of one neuron, but the tradeoff is minute.  After all, one can simply accept a certain level of accuracy and just use lookup tables for more traditional neural networks, circumventing the problem of calculating exponentials, square roots, and so on.  With Numenta’s algorithms, because they are already operating on such a low level (binary activation values, logical OR functions on distal dendrites, etcetera), there isn’t a great deal of optimization opportunity available.

3. You don’t get a lot of mathematical backing for HTM.  In fact, you get none at all.  There are some basic results you can check – probability of a certain set of neurons being active at a given time, for example – but these don’t open up to much additional analysis.  The underlying mechanics of HTM are not particularly amenable to methods from optimization, something which it desperately needs.  The theory of sparse distributed representations is nice and all, but losing touch with the mathematics of the problem is simply a bad move.  And with their decision to go completely proprietary, I for one don’t know of any mathematicians who are going to want to fill in the gaps in HTM theory specifically.  It’s simply not worth the time.

4. It’s not amenable to parallel processing.  These very words came from Hawkins himself, who was talking about how a researcher tried the framework on a GPU, but it offered little or no benefit.  For me, that’s a red flag.  If an algorithm doesn’t parallelize well, it doesn’t scale well.  If it doesn’t scale well, it’s not for the 21st century.  When you’re talking about artificial intelligence especially, parallel performance is top priority.  A proof of concept on a desktop with a dual core should be nothing short of a marvel on a Blue Gene, if your algorithm is all it’s talked up to be.

Ray Kurzweil’s upcoming book, How to Create a Mind, is said to build on the more general ideas behind HTM and expand them greatly.  These expansions would be welcome, especially since Kurzweil is known to have a keen eye for detail, and those very details are needed in the case of HTM.  Kurzweil’s improvements may be just what HTM needs.

As an aside: I promised some C++ code, and it’s still coming.

EDIT 03/13/13: I’ve decided not to release the implementation I had going, due to the exact concerns mentioned.  Sorry!

21st Century Mathematics

What does mathematics look like in the 21st century?  I’m in no position to make any declarations, what with not being an expert on math history and all, but I’d like to offer up a couple of brief observations to think on.

If I had to name one candidate for the overall flavor of 21st century mathematics, I’d say complex adaptive systems.  Why?  Because it encapsulates the transition we’re seeing from the rigid, linear, and static to the complex, nonlinear, and dynamic.  I think a lot of this in particular has been motivated by a couple of things: 1. our great and ever-increasing numbers as humans, and 2. the increasing complexity of the technology we use to accomplish our tasks.  Given an exponential increase in population coupled with an exponential increase in the complexity of the technology being used virtually every second of every day, some new mathematics were due to emerge.  Among the more interesting examples you have things like fractal geometry, cellular automata, and ‘system of systems’.  New variations of these and other approaches are appearing daily in academic journals, and some make it to market.

The pace is increasing, to the degree that the landscape is changing faster than anybody can keep up with.  That’s technology as a whole.  For mathematics, an entirely new era is on its way in, motivated by society and the thirst for new technology.  The reigning paradigms of this century will likely be vastly complex networked systems, and how to describe them accurately.  This includes anything from social networks to artificial intelligence, transportation systems (including space traffic), economics, neuroscience, biology – almost anything you can think of.  What’s becoming apparent is that everything that was off limits to traditional mathematics is becoming accessible through new frameworks.

These are exciting times!  The future is bright, and there is surely no end to the amount of adventure a motivated person can have this century.