From Paper to Production: Shortening the Ramp


One of the things that strikes me about the current state of machine learning is how long it still takes to get a new algorithm or model into production. From the time that 1) a paper is published to when 2) its contents are evaluated by those doing machine learning in industry and 3) they subsequently commit to developing it, years have past. It does not need to be this way.

Those doing machine learning are understandably wary of newer methods, and I can see why they might opt to give it time for hidden problems to be discovered before committing. The long-term viability of a model is often judged by its very ability to remain on the scene after many years, which, though vaguely tautological, remains valid. For those models that survive the process, they are deemed essential, timeless. The rest are disregarded.

There are two sides from which this can be viewed: the business risk side, and the development side. These are deeply intertwined.

The Risk Side

Product managers and tech leadership who are coming to grips with the reality that is the pervasiveness of machine learning have an increasing number of considerations, many of them in technology areas that may not be within their expertise. There may be a bit of silver lining, however: the discussion regarding what machine learning technology makes its way into new products is fundamentally one of risk. To the extent that they can work with those within their organization or its allies who have the expertise to accurately assess the risk of newer methods, they can quantify the level of risk for when things go wrong.

For instance, if a new method can drop the error rate of a certain type of prediction down to 3% (as opposed to a previous 5%), how does that affect the risk statistics of the business? Does it enable broader distribution or reach into a higher market segment? Does it enable new products entirely? These questions must be answered.

New qualitative capabilities may seem more difficult to judge, but that is not necessarily true. For instance, some newer capabilities involve an AI system describing what’s in a photograph, using completely natural language with understandable sentences. If a product manager or CTO is considering using this capability in a new product or feature, the error rate of the method can still be used to assess the risk to the business that the new feature exposes them to. The degree of risk will vary widely by the specific industry and application, but the process remains the same.

The Development Side

Even if all parties can agree that a new method is tempting enough to use in a feature, somebody still has to code the thing. This is where progress is sluggish. Developing new, unfamiliar models and validating them is a nontrivial effort, even for experienced ML programmers. Assuming you’re doing the first implementation in a given language or environment, it requires a degree of getting into the thought process of the researchers. Often, direct correspondence is needed to clarify details.

While many papers include pseudocode that can be readily translated into a programming language, just as many do not. From there, you are left to develop a deep understanding of the model’s description and translate its mathematical definition and data structures into a complete implementation. It’s hard work.

This is the part where things can slow down: without a clear understanding of the model and its behavior, it is not possible for a tech lead, data scientist, or ML developer to accurate judgements about the level of risk or the likelihood of bugs or other surprise behavior. More than the error rate, one has to assume that the resulting implementation will have its own quirks and bugs. To assume otherwise would be both unrealistic and foolish.

Many companies may be slower to adopt “bleeding edge” methods, then, because it is simply too difficult to enumerate implied capabilities and to quantify the risk it imposes. How can this be solved?

Shorten the Ramp

Consider the situation where there is X new deep learning model and a company really wants to use it in their products, but may not have a good way of reaching the logical conclusions of doing so. We can point out the main issues:

  • It can be a challenge to arrive at an exact error rate for the specific application before an implementation has been made. The paper will use test datasets, but the model will almost surely behave differently with the data specific to a feature.
  • There is often a break in the communication between those gaining understanding of the model and those assessing how it may affect the business overall. It could be anything from a smash hit to total disaster.
  • Even when a model is finished, it will need to land in an environment in which to run. Engineers should keep the infrastructure requirements in mind from the beginning.
  • In a waterfall or waterfall-like process, it is of course not possible to create requirements in the absence of understanding of the capabilities involved. This stalls progress.
  • Agile development is out the window, due to the high sensitivity of the relationship between model performance and feature risk or cost. These aren’t really the kinds of things you can just “ship first, iterate later”. Much needs to be worked out before it goes into the hands of feature developers.

All of this points toward two ways to shorten the ramp to deployment:

  1. If a company is genuinely interested in adopting new algorithms and models in their offerings, they need to provide representative data as soon as possible.
  2. Their engineering and/or data science team(s) need to have tools and infrastructure to support rapid prototyping of new models.

Only with datasets that are representative of what will occur in a production setting can a team judge their implementation and profile its performance. In the paper, a model may boast 90-something percent accuracy, but you may find that for your problem it is a little less, thereby affecting how risky the investment in developing the model is.

This can happen before the implementation phase, by looking at the test data used in a paper. A talented developer or data scientist can format the internal dataset to be similar to that used in a test dataset from the paper, thereby reducing opportunity for errors to arise from differences in data formatting.

An example process, then, might look like this:

  1. Product manager decides they need X new capability in their next phase of features.
  2. Their first order of business, then, is to gather and build datasets that are close to the actual problem. At the very least, they should ensure that whoever can build that dataset has access to all the tools and data sources needed to complete it quickly.
  3. Product manager hands over dataset, high-level requirements, and asks data science and/or engineering team(s) to begin investigating models.
  4. Technical team either begins profiling models they already know about or scouting for models that are known to enable the required capabilities.
  5. Development / prototyping begins with the selected model(s) and the datasets provided.
  6. Throughout development process, error rates and other important metrics are reported back to the product manager (or whomever is overseeing the process).
  7. Risk calculations are adjusted as this information flows in. For example, if it’s a photo auto-tagging feature in question, one can determine how many users are likely to experience incorrectly tagged photos and how often, based on the volume of photos and the error rate of the model. From that, one can determine how much of a risk it is to a the business – are the users likely to leave if they experience the error, or is it not a deal-breaker?
  8. Once all models have been profiled and tested, a decision can be reached about whether or not to proceed with the feature.

Of course not every company has a product manager or entire teams for data science and engineering, but the overall structure can be applied by those filling the roles – even if it’s all the same person.

In summary, the best way I see to shorten the time from a published paper to a viable production implementation is to 1) provide data as early as possible and 2) ensure engineering has tools to quickly prototype and test models. Both are difficult, but both will pay off considerably for those willing to put in the effort.

I hope this has been helpful to you. Please ask questions in the comments.

 

 

Advertisements

The Octet System: A Way to Think About AI


You see countless headlines about AI these days, littered with references to “deep learning”, “neural networks”, “bots”, “Q&A systems”, “virtual assistant”, and all manner of other proxy terms. What’s missing from this entire discussion is a way to gauge what each system is really capable of.

In the spirit of the Kardashev Scale, I’ve put together my own ranking system for AIs, which we’ll be using at Machine Colony.

(Note: I’ll try to provide as much background and example information as I can without this post reading like a cog-sci textbook. For those savvy in AI these examples will no doubt seem pedestrian, however I do try to illustrate concepts as much as possible, in a perhaps ill-fated effort to refrain from being too esoteric.)

Introducing the Octet System

I’m not much for fancy names, but in this case it was fitting: list the qualitative capabilities of an information system, and break it down into eight distinct ranks, or “classes”. They are as follows:

Class Null

The zeroth class is something which does not qualify as an intelligent system whatsoever. While this can cover any manner of programs – be they in software or manifested in processes emerging from hardware – I choose to focus on software for this example.

Programs that fall into class null have the following characteristics:

  • They are only able to follow explicit, predetermined / deterministic logic.
  • They follow simple rules – “if this then that” – with no capacity to ever learn anything.
  • They have no capacity to make nuanced decisions, i.e. based on probability and/or data.
  • They have no internal model of the world (this is related to learning).
  • They do not have their own agency.

This is by far the broadest class, as it covers the vast majority of our software systems today. Most of the world’s software is programmed for a specific task and does not really need to be a learning, decision-making system.

Class I

Programs of this class have the following characteristics that are different from Class Null:

  • They have the ability to make rudimentary decisions based on data, based on some trained model.
  • They have the ability to learn from the outcomes of their decisions, and thus to update their core model.
  • As such, their behavior may vary over time, as the data changes and their model changes.
  • They are trained for a small number of narrow tasks, and do not have the capability to go outside those tasks.

This covers things like fraud detection agents, decent spam detection, basic crawling bots (assuming they’re at least using decision trees or something similar). The decision could be a classification action – marking something as spam, for instance – or it could be deciding how well a website ranks in the universe of websites.

Class II

Classes I and II are the most similar on this scale, because the distinction is subtle: a Class II program has all of the capabilities of a Class I, but may have more than one core model and more than one domain of action. For instance, the new Google Translate app has natural language and vision capabilities, with different models for each. These models are linked and ‘cooperate’ to translate the words in the field of view of your smartphone’s camera.

Class Is, by contrast, only have one area that they’re focused on, and make decisions only based on the model relevant to that domain.

Class III

Programs of Class III have a two main distinctions from Class IIs:

  • They have a basic memory mechanism, and the ability to learn from their history in those memories. This is more advanced than simply referring to data; these programs are actually building up heuristics from their own behavioral patterns.
  • They persist some form of internal model of the world. This assists in creation of memories and new heuristics in their repertoire.

Class IV

This class starts to loosely resemble the intelligence level of insects. Class IVs not only have some kind of internal model of the world, but they gain an ability that was essential to the evolution of all complex life on earth: collaboration.

Thus, their characteristics are:

  • They have the ability to collaborate with other agents/programs. That is, they have mechanisms with which to become aware of other agents, and a medium through which to communicate signals. (Think of ants and bees leaving chemical traces, for instance.)
  • Like Class IIIs, they have an internal model of the world. However, a Class IV’s model is more closely linked to its goal structure, and not merely ad hoc / bound in one model. Its internal model may be distributed across several subsystems / mathematical models; representations of complex phenomena or experiences are encoded across various components in its cognitive architecture (vision systems, memory components, tactile systems, etc).
  • They have the ability to perform rudimentary planning, driven by fairly rigid heuristics but with a little flexibility for learning.
  • They have the ability to form basic concepts, schema, and prototypes.

While it is not a prerequisite that Class IVs have multiple distinct sensory modalities – optic, auditory, tactile, olfactory systems – that serves as a good example of the complexity level these programs start to achieve. In an AI setting, a program could have hundreds of different types of inputs, each with their own data type and respective subsystem for processing the input. The key distinction is that in Class IV programs, these subsystems have a high degree of connectivity, and thus generate more complex behavior.

Many robotics software systems could also be placed in Class IV.

Class V

The capabilities of Class Vs begin to resemble more complex animal behavior, such as rats (but not as intelligent as many apes). The primary characteristics of note are:

  • They have the ability to reflect on their own ‘thoughts’. In an AI program setting, this would mean it has the ability to optimize its own metaheuristics. In rats, for example, this is manifested as a basic form of metacognition.
  • They have the ability to perform complex planning, especially in which they are able to simulate the world and themselves in it. Which leads to:
  • They have the ability, to some degree, to simulate the world in their minds. That is, they can perturb their internal model of the world without actually taking action, and play out the results of hypothetical actions. They can imagine scenarios based on their knowledge of the world, which is intimately related to their memories (recall the memory capability from Class III).
  • Related to their planning and internal simulation capabilities, they have the ability to set their own goals and take steps to achieve them. For instance, a rat may see two different pieces of food, decide that it likes the looks of one of them better than the other, set its goal to acquire the better-looking morsel, and subsequently plan a path to get it. The planning part relies on actions it knows it can do – how fast it can or run, how far can it jump – and the terrain ahead of it, as well as memories of how it may have conquered that type of terrain before. Thus goal-setting and planning rely heavily on memory and the internal model.
  • They have a rudimentary awareness of their own agency in the environment. That is, when they are planning, they treat themselves as a factor in the environment they are simulating.

By this time, you have a program which is able to reflect, plan, collaborate with other agents, set goals, learn new behaviors and strategies for achieving its goals, and simulate hypothetical scenarios.

Class VI

You’re a class VI. So am I. Almost every human being is a Class VI – ‘almost’ because, well, this designation is questionable when applied to some politicians.

Programs of this class will start to resemble human-level intelligence and capability, though not necessarily human-like in nature. While in humans a major difference is more complex emotions, this scale does not consider emotions directly.

Artificial intelligence has not yet reached this level, and there are varying predictions as to when it will. The good news is that the expert consensus is clear on the idea that it will happen, it’s just that no one knows exactly when.

Key components of Class VI agents/AIs are:

  • Full self-awareness. The agent is fully aware of itself, its history, where the environment ends and it begins, and so on. This is related to consciousness, though Class VIs need not necessarily be conscious in a strict sense.
  • They have the ability to plan in the extremely long term, thinking ahead in ways that more basic systems cannot. Specific timescales are relative to its natural domain: for a person, decades; for an AI program, perhaps, seconds or hours.
  • Class VIs are able to invent new behaviors, processes, and even create other ‘programs’. In the case of a human, this is obviously an inventor creating a new way of solving a problem, or a software developer programming AIs somewhere in Brooklyn…

Class VII

This is what might well be referred to as ‘superintelligence‘. While some AI experts are skeptical of whether or not this can be achieved, there does seem to be broad agreement that it is imminent. Nick Bostrom writes elegantly about the subject in his book of the same name.

While nobody knows exactly what this may look like, there are two major distinguishing factors which would almost certainly be present:

  • They have the ability to systematically control their own evolution.
  • They have the ability to recursively improve themselves, perhaps even at alarmingly minuscule timescales.

Their first ability is perhaps their most profound. While humans do in some sense control our own fate, we do not yet have fine-grained control over the evolution of our brains and hence our cognitive abilities (though CRISPR may soon change that). With an artificial superintelligence, many limitations are removed. They can arbitrarily copy-paste themselves, ad finitum, and perform risk-free simulations of their new versions. They also will be essentially immortal, so long as their hardware persists and has a supply of energy.

With respect to the second ability, one might imagine an ASI (artificial superintelligence) making multiple clones of itself, each clone independently applying a self-improving strategy, and then each one in turn performing a set of benchmark tests to determine which one improved the most from the original copy. Whichever agent performed the best would become the new master copy, while the others would be taken out of the running.

This is a supercharged evolutionary algorithm, essentially. The tests would be agreed upon in advance, and even perhaps written as a cryptographically secure contract (blockchain-based or otherwise) to prevent cheating. In doing so, the agent would keep improving up to hardware limits or some theoretical asymptote.

The kind of scenario above is not at all unlikely in the near future.

 

Summary

AI capabilities currently exist somewhere between the Class IV and Class V marks, but are quickly marching toward Class VI. DeepMind and Facebook are leading the way in this direction, though other notable players are making important contributions. Certainly the brand-new OpenAI will have some interesting insights as well.

My hope is that this type of classification system, and others like it, will help bring some structure to the conversation around fast-emerging AI. With deeper clarity in our common language, we can have more meaningful and productive conversations about how we wish for this technology to advance and how it ought to be used. We owe it to ourselves to have the linguistic tools to accurately describe our progress.

A World Inside the Mind


Short post today, but a few things occurred to me as I was reading the paper on Bayesian Program Learning:

  • This form of recursive program induction starts to look suspiciously like simulation – something we do in our minds all the time.
  • Simulation may be a better framing for concept formation than via the classification route.
  • Mapping the ‘inner world’ to the ‘outer world’ seems a more sensible approach to understanding what’s going on. If you look at the paper, you also see some thought-provoking examples of new concept generation, such as the single-wheel motorbike example (in the images). This is the most exciting point of all.

A final design?

Combine all the elements together, along with ideas from my last post, and you get something that:

  1. Simulates an internal version of the world
  2. Is able to synthesize concepts and simulate the results, or literally ‘imagine’ the results – much like we do
  3. Is able to learn concepts from few examples
  4. Has memories of events in its lifetime / runtime, and can reference those events to recall the specific context of what else was happening at that time. That is, memories have deep linkage to one another.
  5. Is able to act of its own volition, i.e. in the absence of external stimulus. It may choose to kick off imagination routines – ‘dreaming’, if you will – optimize its internal connections, or do some other maintenance work in its downtime. Again, similar to how our brains do it while we sleep.

This starts to look like a pretty solid recipe for a complete cognitive architecture. Every requirement has been covered in some way or another, though in different models and in different situations. To really put the pieces together into a robust architecture will require many years of work, but it is worth exploring multi-model cognitive approaches.

If it results in a useful AI, then I’m all in.

 

Deep Learning & Temporal Modeling


In most discussions I’ve seen of deep learning, and certainly most of the models demonstrated, there is no discussion of temporal sequence modeling. I was curious where the state of the art was for this task, and thought to compare with some of my own intuitions about sequence modeling.

As a first pass, I found a handful of papers that discuss using stacked Restricted Boltzmann Machines in various configurations to achieve temporal learning – they are “Robust Generation of Dynamical Patterns in Human Motion by a Deep Belief Nets“, “Temporal Convolution Machines for Sequence Learning“, and “Sequential Deep Belief Networks“. The last of these three lays out their approach nicely in one sentence: “An L-layer SDBN is formed by stacking multiple layers of SRBMs.” It is, in essence, the stacking game continued.

While the approaches described in the three papers above would seem to yield decent results, I often wonder about extendability and scalability. RBMs / DBNs have lovely analytical properties, and cleverly get around intractable subproblems with use of sampling schemes, but their topology is nonetheless locked in all setups I’ve seen. This leaves no room for simulated neurogenesis.

Why would simulated neurogenesis be important? Moving the synapse weights around in a fixed topology yields nice results, but there may situations where we would want the topology to grow as needed – namely, if the network did not have a representation for a given pattern, it could create one. This assumes that we are ok with being slightly inefficient in terms of ensuring that the representation space is adequately filled before we generate new neurons. The tradeoff is a greater amount of space in memory.

Lately I’ve been working on these types of networks, and it is most certainly difficult to do. First you have the basic foundation of any neural network – learning spatial patterns. That part is relatively easy: you can quickly build something to learns to model spatial patterns and produce labels for them.

The next step is creating the dynamics that model temporal sequences. For example, a sequence of words: if I say “four score and-“, many of you will instantly think “seven years ago”. This is a sequence. The spatial patterns are the specific combinations of letters to form words, and the temporal pattern is sequence of those words. We learn this kind of thing with relative ease, but for machines this is a huge task.

Unsurprisingly, it is in this step that things get complicated. The noted Deep Learning approach is to specify a given time depth. In one paper T=3, meaning it can model sequences 3 time steps deep – in our example, up to 2 words in advance. When the network receives “four score and”, if it knows the sequence in question then it is thinking of “seven years”. When it gets to “seven”, it is thinking “years ago”.

Note that in one of the papers above, they use a different subnetwork entirely to model temporal constraints. Again, while this is nice from an analytical point of view, it likely has little to do with the way the brain works. The brain is essentially a hundred billion cells who knows nothing but how to behave in response to electrical and chemical signals. Their emergent behavior gives rise to your consciousness. The brain, at least as far as anybody can tell, does not have a “temporal network”. Temporal information is learned as a natural consequence of the dynamics of the vast network of cells. Somehow, we need to figure out a way to model temporal information inline with the spatial information, and make it all fit together nicely.

That said, the approach I’ve come to borrows a bit from Deep Learning, a little bit from Jeff Hawkins’ and Dileep George’s work on spatiotemporal modeling, and a little from complex systems in general. I’ve been searching for the core information overlap between different approaches, and have found some commonalities. From that, I’ve come to some notes of practice.

First, it almost always becomes necessary to sacrifice analytical elegance for emergent behavior. In many ways, emergent behavior is innately chaotic, and therefore difficult to model mathematically. Stochastic methods yield some insight, but there are higher-level states of emergence that may not be obvious from analysis of a single equation or definition of a system. In this case, it is simpler in practice to use heuristic methods to find emergent complexity, and attempt to characterize it as it is discovered, rather than attempt to discover all possible states from the definition of the system. A characteristic of chaotic systems is that you have to actually advance/evolve them in order to derive their behavior, as opposed to knowing in advance via analytical methods.

Second, and further emphasizing the use of heuristic methods, tuning the model with genetic algorithms tends to yield better results than attempting to solve for optimal parameters explicitly. Perhaps this is merely a difference in style, and if I’d spent more time studying complex systems I might know of better ways to do this, but at my current level of understanding a simple genetic algorithm that swarms over parameter configurations yields better results than attempting to understand what the “perfect network” might look like. There is a philosophical difference in that with this method you’re using the machine to understand the machine, in some sense surrendering control and rendering of insights to the machine itself. Genetic algorithms may be able to find subtleties that my slightly-more-evolved-ape-brain will miss or otherwise fail to conceptualize merely from the definition of the system and associated intuitions.

Deep Learning is advancing quickly, and while it offers some interesting food for thought when attempting to solve temporal modeling problems, I am not yet sold on the notion that it is the final answer to this more general problem. Choices of representation and methods of optimization may be trivial in some cases, but when they differ greatly from the norm they may yield some advantage. Not only that, but sticking closer to the way the brain represents information has done nothing but improve the performance and capabilities of the resulting systems. The path I’m on may all come to nothing, or it may shine light on some new ways to think about temporal modeling problems.

A closing note: A whitepaper describing my work is underway, so you can stare in awe at some formidable-looking equations and cryptic diagrams. I’ve been several years down this path now, and it’s high time to encapsulate all of the work done in a comprehensive overview.

Context & Permutations


In the pursuit of Artificial General Intelligence, one of the challenges that comes up again and again is how to deal with context.  To illustrate: telling a robot to cross the street would seem simple enough.  But consider the context that five minutes ago somebody else told this robot not to cross the street because there was some kind of construction work happening on the other side.  What does the robot decide to do?  Whose instruction does it consider more important?

A robot whose ‘brain’ did not account for context properly would naively go crossing the street as soon as you told it to, ignoring whatever had come before.  This example is simple enough, but you can easily imagine other situations in which the consequences would be catastrophic.

The difficulty in modeling context in a mathematical sense is that the state space can quickly explode, meaning that the number of ways that things can occur and sequences they can occur in is essentially infinite.  Reducing these effective infinities down to manageable size is where the magic occurs.  The holy grail in his case is to have the computing of the main algorithm remain constant (or at least linear) even as the number of possible permutations of contextual state explodes.

How is this done?  Conceptually, one needs to represent things sparsely, and have the algorithm that traverses this representation only take into account a small subset of possibilities at a time.  In practice, this means representing the state space as transitions in a large graph, and only traversing small walks through the graph at any given time.  In this space-time tradeoff, space is favored heavily.

The ability to adeptly handle context is of utmost importance for current and future AIs, especially as they take on more responsibility in our world.  I hope that AI developers can form a common set of idioms for dealing with context in intelligent systems, so that they can be collaboratively improved upon.

We’ve had it all wrong.


All this time, we’ve had it all wrong.

Artificial Intelligence (AI) has been a science for over 50 years now, and in that time has accomplished some amazing things – computers that beat human players at chess and Jeopardy, find the best routes for delivery trucks, optimize drug delivery, and many other feats.  Yet the elusive holy grail of “true AI”, or “sentient AI”, “artificial general intelligence” – by whatever name, the big problem – has remained out of our grasp.

Look at what the words actually say though – artificial intelligence.  Are we sure that intelligence is really the crucial aspect to creating a sentient machine?

I claim that we’ve had it wrong.  Think about it: intelligence is a mere mechanical form, a set of axioms that yield observations and outcomes.  Hypothesis, action, adjustment – ad infinitum.  The theory has been if we could just create the recursively self-optimizing intelligence kernel, BOOM! – instant singularity.  And we’d have our AGI to run our robots, our homes, our shipping lanes, and everything imaginable.

The problem with this picture is that it assumes intelligence is the key underlying factor.  It is not.

I claim the key factor is…

…wait for it…

Consciousness.

Consciousness might be defined as how ‘aware’ an entity is of itself and its environment, which might be measured by how well it was able to distinguish things like where it ends and its environment begins, a sense of agency with reference to past actions it performed, and a unified experience of its surroundings that gives it a constantly evolving sense of ‘now’.  This may overlap with intelligence, but it is a different goal: looking in the mirror and thinking “that’s me” is different than being able to beat humans at chess.  A robot understanding “I broke the vase” is different than an intelligence calculating the Voronoi diagram of the pottery’s broken pieces lying on the floor.

Giulio Tononi’s work rings a note in harmony with these ideas.  Best of all, he and others discuss practically useful metrics of consciousness.  Whether Integrated Information Theory is the root of all consciousness or not is immaterial; the point is that this is solid work in a distinctly new direction, and approaches the fundamental problems of AI in a completely new way.

Tononi’s work may be a viable (if perhaps only approximate) solution to the binding problem, and in that way could be immensely useful in designing systems that have a persisting sense of their evolving environment, leading us to sentience.  It is believable that intelligence may be an emergent property of consciousness, but it seems unlikely that intelligence alone is the ingredient for consciousness itself, and that somehow a certain ‘amount’ of intelligence will yield sentience.  One necessarily takes precedence over the other.

Given this, from now on I’ll be focusing my work on Artificial Consciousness, which will differ from Artificial Intelligence namely in its goals and performance metrics: instead of how effectively an agent solved a problem, how aware it was of its position in the problem space; instead of how little error it can achieve, how little ambiguity it can achieve in understanding its own boundaries of existence (where the program ends and the OS begins, where the robot’s body ends and the environment begins).

I would urge you to read Tononi’s work and Adam Barrett’s work here.  My Information Theory Toolkit (https://github.com/MaxwellRebo/ittk) has several of the functions you’ll need to start experimenting on systems with a few more lines of code (namely, use Kullback-Leibler divergence).

In the coming months, I’ll be adding ways to calculate the Information Integration of abstracted systems, or its Phi value.  This is NP-Hard, so it will have to remain in the domain of small systems for now.  Nonetheless, I believe if we start designing systems with the intent of maximizing their integration, it will yield some system topologies that have more beneficial properties than our usual ‘flat’ system design.

Artificial Intelligence will no doubt continue to give us great advances in many areas, but I for one am embarking on a quest for something subtly but powerfully different: Artificial Consciousness.

Note: If you have some programming skill and would like to contribute to the Information Theory Toolkit, please fork the repository and send me an email so we can discuss possibilities.  I’ll continue to work on this as I can.

The Best & Worst of Tech in 2013


Keeping with tradition, I’ll review some of the trends I noticed this past year, and remark on what they might mean for those of us working in technology.

THE BEST

JavaScript becomes a real language!

To some this might seem trivial, but there’s a lot to be said here.  With the massive growth of Node.js and many associated libraries, Google’s V8 engine has been stirring up the web world.  Write a Node.js program, and I guarantee you’ll never think of a web server the same way again.

Why is this good?  This isn’t an advertisement for Node.js, but I would posit that these developments are good because they open up entire new worlds of productivity – rapid prototyping, readable code, and entirely new ways of thinking about web servers.  Some folks are even running JavaScript on microcontrollers now, a la Arduino.  JavaScript has been unleashed from the confines of the browser, and is maturing into a powerful tool for creating production-quality systems with high scalability and developer productivity.  Exciting!

Cognitive Computing Begins to Take Form

Earlier this year I stated a belief that 2013 would be the year of cognitive systems.  Well that hasn’t been fulfilled completely, but we’ve nonetheless seen some intriguing developments in that direction.  IBM continues to chug away at their cognitive platforms, and Watson is now deployed working full time as an AI M.D. of sorts.  Siri has notably improved from earlier versions.  Vicarious used their algorithms to crack CAPTCHA.  Two rats communicated techepathically (I just made that word up) with each other from huge distances, and people have been controlling robots with their minds.  It’s been an amazing year.

The cognitive computing/cybernetics duo is going to change, well, everything.  I would argue that cybernetics may just top the list of most transformative technologies, but it has a ways to go before we go full Borg.

Wearables Start to Become a Thing

Ah, wearables.  We’ve waited for nifty sci-fi watches for so long – and lo!  They have come.  Sort of.  They’re on their way, and we’re starting to catch glimpses of what this will actually mean for technology.  I agree with Sergey Brin here: it’ll get the technology out of our hands and integrated into our environment.  Personally I envision tech becoming completely seamless and unnoticeable, nature-friendly and powerful, much like our own biological systems, but that’s another article entirely.

Wearable technology will combine with the “Internet of Things” in ways we can’t yet imagine, and will make life a little easier for some and much, much better for others.

Internet of Things

The long-awaited Internet of Things is finally starting to coalesce into something real.  Apple is filing patents left and right for connected home gear, General Electric is making their way into the space with new research, and plenty of startups are sprouting to address the challenges in the space (and presumably be acquired by one of the big players).

This development is so huge it’s almost difficult to say what it will bring.  One thing is for sure: the possibilities are only limited to one’s imagination.

21st Century Medicine is Shaping up to be AWESOME

Aside from the fact that we now have an artificial intelligence assisting in medical diagnosis, there have been myriad amazing developments in medicine.  From numerous prospects for cures to cancer, HIV, and many other disease to the advances in regenerative medicine and bionanotechnology, we’re on the fast track to a future wherein medical issues can be resolved quickly and with relatively little pain.  There’s also a different perspective: solve the issue at the deepest root, instead of treating symptoms with drugs.

THE WORST

Every Strategy is a Sell Strategy

This year, tech giants went acquisition-mad.  It seems like every day one of them has blown another few billion dollars on some startup somewhere.

Why is this bad?  It may be good for the little guy (startup) in the short term – they walk away with loads of cash – but in the long term I suspect it will have a curious effect.  It’s almost like business one-night-stand-ism.  You build a company knowing full well that you’re just going to sell it to Google or Facebook.  If not, you fold.

You can see where this goes.  People are often saying they look forward to ‘the next Google’, or ‘the next Facebook’, or whatever.  Well there might not be any.  That is, all the big fish are eating the little fish before they have the chance to become big fish.  Result?  Insanely huge fish.

It’s great that a couple of smart kids can run off, Macbook Pros in hand, and [potentially] make a few billion bucks in a few years, with or without revenue.  But who is going to outlast the barrage of acquisition offers and become the next generation of companies?

Big Data is Still not Clearly Defined

Big Data.  Big data.  BIG.  DATA.

What does it mean?

The buzzword and its many ilk have been floating around for a couple of years now, and still nobody can really define what it does.  Most seem to agree it goes something like: prop up a Hadoop cluster, mine a bunch of stale SQL records in massive company/organization, cast the MapReduce spell and – Hadoopra cadabra!  Sparkling magical insights of pure profit glory appear, fundamentally altering life and the universe forever – and sending you home with bigger paychecks.

I’m all for data analysis.  In fact I believe that a society that makes decisions based on hard evidence and good data-crunching is a smart society indeed.  But the ‘Big Data’ hype has yet to form into anything definitive, and remains a source of noise.  (Big data fanboys, go ahead and flame in the comments.)

 America’s Innovation Edge Dulls

It’s true.  I hate to admit, but it is, undeniably, absolutely true.  America has dropped the ball when it comes to innovation.  That’s not to say we’re not innovating cool things, generating economy and all of that – we are.  But that gloss has started to tarnish.  Specifically, America has a problem with denying talented people the right to be here and work.

It could be our hyper-paranoid foreign policy in lieu of 9/11, it could be the flawed immigration system, it could be Washington gridlock or a million other things.  It’s not particularly fruitful to pass the blame now.  We’re turning away the best and the brightest from around the world, and simultaneously continuing to outsource some of what used to be our core competencies.  The bright spot in all of this is that high-tech manufacturing would seem to be making a comeback, perhaps in part thanks to 3D printing, but it’s not quite enough.  We need more engineers, more inventors, and more people from outside our borders.  This has always been the place people come to plant the seeds of great ideas.  Let’s stay true to that.