You are here


How big data and AI will reshape the automotive industry

O'Reilly Radar - Thu, 2017/07/20 - 08:10

The O’Reilly Data Show Podcast: Evangelos Simoudis on next-generation mobility services.

In this episode of the Data Show, I spoke with Evangelos Simoudis, co-founder of Synapse Partners and a frequent contributor to O’Reilly. He recently published a book entitled The Big Data Opportunity in Our Driverless Future, and I wanted get his thoughts on the transportation industry and the role of big data and analytics in its future. Simoudis is an entrepreneur, and he also advises and invests in many technology startups. He became interested in the automotive industry long before the current wave of autonomous vehicle startups was in the planning stages.

Continue reading How big data and AI will reshape the automotive industry.

Categories: Technology

Adopting AI in the Enterprise: Ford Motor Company

O'Reilly Radar - Thu, 2017/07/20 - 04:00

Dimitar Filev on bringing cutting-edge computational intelligence to cars and the factories that build them.

Driverless cars aren’t the only application for deep learning on the road: neural networks have begun to make their way into every corner of the automotive industry, from supply-chain management to engine controllers.

In this installment of our ongoing series on artificial intelligence (AI) and machine learning (ML) in the enterprise, we speak with Dimitar Filev, executive technical leader at Ford Research & Advanced Engineering, who leads the team focused on control methods and computational intelligence.

What was the first application of AI and ML at Ford?

Ford research lab has been conducting systematic research on computational intelligence—one of the branches of AI—for more than 20 years. About 15 years ago, Ford Motor Company introduced one of the first large-scale industrial applications of neural networks. Ford researchers developed and implemented, in mass-produced cars, an innovative misfire detection system—a neural-net-based classifier of crankshaft acceleration patterns for diagnosing engine misfire (undesirable combustion failure that has a negative impact on performance and emissions). Multiple other AI applications to Ford product and manufacturing followed this success.

How do you leverage AI and ML today to create a better product?

We can think of two categories of ML and AI applications in our vehicles. In addition to the obvious applications in driverless cars, Ford has also developed AI-based technologies that enable different functions in support of vehicle engineering. These are not always visible to the driver.

As I mentioned before, we used recurrent-neural-net-based classifiers for misfire detection in V10 engines; we also use them for intruder detection when the driver is away from vehicle. We also use fuzzy logic-type rule-based gain scheduling controllers integrated with the battery control systems of hybrid-electric vehicles.

In our supply chain, neural networks are the main drivers behind the inventory management system recommending specific vehicle configurations to dealers, and evolutionary computing algorithms (in conjunction with dynamic semantic network-based expert systems) are deployed in support of resource management in assembly plants.

Are there other use cases within Ford today?

Another group of AI applications is driven by the fact that current vehicles have evolved into complex mobile cyber systems with increasing computational power and resources generating gigabytes of information per hour, continuously connected, with information following to, from, and through the platform. Increased capability of vehicle systems, along with the growing customer demand for new features, product improvement, personalization, rich information utilization, etc., are some of the drivers for introducing machine learning techniques in modern vehicles.

The most common AI applications involve direct driver interaction, including advisory systems that monitor acceleration and braking patterns to provide on-board evaluations of a driver’s preferences and intentions for different purposes—characterization of the driver, advice for fuel-efficient driving and safe driving, auto-selecting the optimal suspension and steering modes, simplifying the human-machine interface by estimating the most likely next destination, and preferred settings of the climate control, etc. These systems use traditional AI methods—rule-based, Markov models, clustering; they do not require special hardware. One of their distinctive features is to be intelligent enough to identify the level of acceptance of provided recommendations, and avoid drivers’ annoyance.

Recent extensive development of autonomous vehicles is the driver for deep learning applications to vehicle localization, object detection, classification, and tracking. We can expect in the near future a wide range of novel deep-learning-based features and user experiences in our cars and trucks, innovative mobility solutions, and intelligent automation systems in our manufacturing plants.

What steps have you needed to take to build a team that can grasp and apply recent advances in AI and ML?

We have several centers of excellence in machine learning and AI, with focus on robotics, next-generation autonomous driving, and data analytics. Our goal is to expand the AI-based methodology and development tools throughout the company and to make them part of the commonly used engineering tools, similar to Matlab and Simulink.

Building centers of excellence in AI and ML was not too challenging since, as I mentioned earlier, we had engineers and researchers with backgrounds and experience in conventional neural networks, fuzzy logic, expert systems, Markov decision processes, evolutionary computing, and other main areas of computational intelligence. This created the foundation that we are now upgrading with a state-of-the-art expertise in deep learning methods and tools. We continue to expand this critical mass of experienced engineers by hiring more computer specialists with strong educational backgrounds in AI and ML.

Where does Ford hope to gain a competitive advantage in applying AI and ML?

AI provides an opportunity to better use available information for creating new features and driver-aware, personalized vehicles that would better fit to the specific customer. In addition, machine learning is an irreplaceable enabler for creating smart driver-assist systems and fully autonomous vehicles. Increased connectivity is one of the major drivers expanding the capability of the on-board infotainment and control systems by incorporating cloud resources. In the near future, we can envision seamless integration of vehicle on-board systems with cloud-based intelligent agents, self-organizing algorithms, and other AI tools that would broaden the range of user experiences offered by our mobility solutions.

Are there any areas where you've considered leveraging AI/ML but found that the technology isn't ready yet?

I don’t think so—just the opposite. It seems that the AI/ML toolbox is growing exponentially and ahead of mass applications. We are witnessing an interesting reality—while many research areas (e.g., control engineering, computer programming, cybernetics) were driven by the need for new technical solutions, the AI revolution that is happening now is inspired by the advances in machine learning research. Besides the stimulating effect of some remarkable successes (e.g., Google DeepMind), I would like to mention two important and unique enablers of this rapid development—first, the quick proliferation of research ideas and results that are made immediately available by posting recent publications on or other public websites; and second, the wide accessibility to open source AI software development tools—TensorFlow, Neon, Torch, Digits, Theano, just to mention a few. The challenge now is to mature the most effective and innovative AI solutions, and to integrate them within new features and customer experiences.

Is Ford interested in partnering with other Silicon Valley companies and startups? Are there any initiatives you'd like to see the community focus on?

Ford Motor Company has partnerships with a number of high-tech companies and startups around the world, and, of course, in Silicon Valley. We are an active member of the innovative Silicon Valley community through our Research & Innovation Center in Palo Alto and are always interested in working with new companies and startups.

What's the most promising or interesting advancement you've seen in AI recently, and how do you think it will impact Ford?

It is hard to outline the most interesting one, for the progress in AI is enormous. The number of publications, patents, and software products in the AI area is exponentially growing, and almost every day we are witnessing new accomplishments, novel approaches, and smart applications. I am specifically interested in the developments in reinforcement learning, intelligent agents, game theory, and Markov decision processes since they open the door to new advancements in the field of automated reasoning, decision-making and optimal control, and their automotive and mobility applications.

Continue reading Adopting AI in the Enterprise: Ford Motor Company.

Categories: Technology

Chris Stetson on system migrations and defining a microservices reference architecture

O'Reilly Radar - Thu, 2017/07/20 - 03:45

The O’Reilly Podcast: Helping developers improve performance, security, and service discoverability.

In this podcast episode, O’Reilly’s Jeff Bleiel talks with Chris Stetson, chief architect and head of engineering at NGINX. They discuss Stetson’s experiences working on microservices-based systems and how a microservices reference architecture can ease a development team’s pain when shifting from a monolithic application to many individualized microservices.

Like most developers, Stetson started off writing monolithic applications before moving over to a service-oriented architecture, where he broke apart different components of the application. “So many developers will approach building an application as a monolith because they don’t have to build out the infrastructure, orchestration tools, networking capabilities, and contracts between the different components,” he said. “However, many developers and teams today will approach their application as a monolith with the idea that there will be a clear separation of concerns for different parts that can easily be broken out.”

According to Stetson, the benefits for developers in adopting microservices is similar to the Agile movement, in that you may only have a couple weeks to work on a feature or piece of functionality. “Microservices encapsulate a set of functions and services that are constrained to a single set of concerns,” he said. “As a result, developers need to optimize around those concerns, which will help them build a really powerful and complete system that is harder to accomplish with a large monolithic application.”

Stetson’s passion for optimizing systems around microservices led him to spearhead the creation of NGINX’s Microservices Reference Architecture. “This reference architecture was our attempt to understand how we could help our customers build a microservice application while helping them improve aspects of their architecture related to performance, security, service discovery, and circuit breaker pattern functionality within the environment,” he said. “The reference architecture is an actual photo-sharing application, similar to Flickr or Shutterfly. We chose that application idea since it’s one everyone is familiar with, and it showcases powerful asymmetric computing requirements.”

This reference architecture includes three different networking models: the Proxy Model, the Router Mesh, and the Fabric Model. Stetson mentioned how the Proxy Model is similar in function to, and complements, Kubernetes’ Ingress Controller. (Kubernetes is an open source orchestration tool for managing the deployment and instances of containerized applications.) “Kubernetes has a very powerful framework for organizing microservices, allowing effective communication between services, and providing network segmentation,” he said. “It offers a lot of great services for systems to take advantage of in order to perform traffic management within a microservice application.”

This post and podcast is a collaboration between O'Reilly and NGINX. See our statement of editorial independence.

Continue reading Chris Stetson on system migrations and defining a microservices reference architecture.

Categories: Technology

Edward Callahan on reactive microservice deployments

O'Reilly Radar - Thu, 2017/07/20 - 03:30

The O’Reilly Podcast: Modify your existing pipeline to embrace failure in isolation.

In this podcast episode, I talk about reactive microservice deployments with Edward Callahan, a senior engineer with Lightbend. We discuss the difference between a normal deployment pipeline and one that’s fully reactive, as well as the impact reactive deployments have on software teams.

Callahan mentioned how a deployment platform must be developer and operator friendly in order to enable the highly productive, iterative development being sought by enterprises undergoing software-led transformations. However, it can be very easy for software teams to get frustrated with the operational tooling generally available. “Integration is often cumbersome on the development process,” he said. “Development and operations teams are demanding more from the operational machinery they depend on for the success of their applications and services.”

For enterprises already developing reactive applications, these development teams are starting to realize their applications should be deployed to an equally reactive deployment platform. “With the complexity of managing state in a distributed deployment being handled reactively, the deployment workflow becomes a simplified and reliable pipeline,” Callahan said. “This frees developers to address business needs instead of the many details of delivering clustered services.”

Callahan said the same core reactive principles that define an application’s design—responsiveness, resiliency, elasticity, and message driven—can also be applied to a reactive deployment pipeline. The following characteristics of such a pipeline include:

  • Developer and operations friendly—The deployment pipeline should support ease of testing, continuous delivery, cluster conveniences, and composability.
  • Application-centric logging, telemetry, and monitoring—Meaningful, actionable data is far more valuable than petabytes of raw telemetry. How many messages are in a given queue and how long it is taking to service those events is far more indicative of the service response times that are tied to your service-level agreements.
  • Application-centric process monitoring—A fundamental aspect of monitoring is that the supervisory system automatically restarts services if they terminate unexpectedly.
  • Elastic and scalable—Scaling the number of instances of a service and scaling the resources of a cluster. Clusters need some amount of spare capacity or headroom.

According to Callahan, the main difference between a normal deployment pipeline and a reactive one is the ability for the system to embrace failure in isolation. “Failure cannot be avoided,” he said. “You must embrace failure and seek to keep your services available despite it, even if this requires operating in a degraded manner. Let it crash! Instead of attempting to repair nodes when they fail, you replace the failing resources with new ones.”

When making the move to a reactive deployment pipeline, software teams need to remain flexible in the face of change. They also must stay mindful of any potential entrapments resulting from vendor lock-in around a new platform. “Standards really do help here,” Callahan said. “Watch for the standards as you move through the journey of building your own reactive deployment pipeline.”

This post is a collaboration between O'Reilly and Lightbend. See our statement of editorial independence.

Continue reading Edward Callahan on reactive microservice deployments.

Categories: Technology

Four short links: 20 July 2017

O'Reilly Radar - Thu, 2017/07/20 - 03:15

SQL Equivalence, Streaming Royalties, Open Source Publishing, and Serial Entitlement

  1. Introducing Cosette -- a SQL solver for automatically checking semantic equivalences of SQL queries. With Cosette, one can easily verify the correctness of SQL rewrite rules, find errors in buggy SQL rewrites, build auto-graders for SQL assignments, develop SQL optimizers, bust “fake SQLs,” etc. Open source, from the University of Washington.
  2. Streaming Services Royalty Rates Compared (Information is Beautiful) -- the lesson is that it's more profitable to work for a streaming service than to be an artist hosted on it.
  3. Editoria -- open source web-based, end-to-end, authoring, editing, and workflow tool that presses and library publishers can leverage to create modern, format-flexible, standards-compliant, book-length works. Funded by the Mellon Foundation, Editoria is a project of the University of California Press and the California Digital Library.
  4. The Al Capone Theory of Sexual Harassment (Val Aurora) -- The U.S. government recognized a pattern in the Al Capone case: smuggling goods was a crime often paired with failing to pay taxes on the proceeds of the smuggling. We noticed a similar pattern in reports of sexual harassment and assault: often people who engage in sexually predatory behavior also faked expense reports, plagiarized writing, or stole credit for other people’s work.

Continue reading Four short links: 20 July 2017.

Categories: Technology

Katie Moussouris on how organizations should and shouldn’t respond to reported vulnerabilities

O'Reilly Radar - Wed, 2017/07/19 - 06:45

The O’Reilly Security Podcast: Why legal responses to bug reports are an unhealthy reflex, thinking through first steps for a vulnerability disclosure policy, and the value of learning by doing.

In this episode, O’Reilly’s Courtney Nash talks with Katie Moussouris, founder and CEO of Luta Security. They discuss why many organizations have a knee-jerk legal response to a bug report (and why your organization shouldn’t), the first steps organizations should take in formulating a vulnerability disclosure program, and how learning through experience and sharing knowledge benefits all.

Continue reading Katie Moussouris on how organizations should and shouldn’t respond to reported vulnerabilities.

Categories: Technology

Growth hacking in SEO

O'Reilly Radar - Wed, 2017/07/19 - 03:00

A look at the successes and failures of a company using experimental SEO practices.

Classic Growth Hackin'

This project was one of those during which we kept high-fiving constantly because the money was rolling in steadily. When the chemistry is right for both agency and client, magical things can happen. Strong communication between marketing and product development will only benefit sales. It behooves the growth hacker to stay as granular as possible with sales processes (to identify opportunities). The growth challenges we faced were related to the consumer's trust. Previous regimes at the company tried and failed at paid search, paid social, and old-fashioned PR (smiling and dialing).

This project was one of the fastest increases in growth for both program development and revenue that I've ever seen. We quadrupled the company's sales in less than two months by leveraging organic, social, and paid traffic channels. We were promoting a business-to-consumer widget that was a cool modern twist on an old established household product. They'd raised a plethora of money from a crowdfunding site. A slew of high-priced consultants and CMOs had been in and out of the organization, which was apparent by their use of four distinct analytics suites of tools. To reach for the stars is a noble goal, but to build momentum teams have to be unified.

Continue reading Growth hacking in SEO.

Categories: Technology

Four short links: 19 July 2017

O'Reilly Radar - Wed, 2017/07/19 - 02:05

Open Source Car Code, Glass for Business, Videogame Narrative Skills, and Gmail Leverage

  1. Apollo -- open source autonomous auto platform.
  2. Glass for Enterprise -- Google X has relaunched Glass for businesses. See blog post and Steven Levy. A HUD for assembly operators in factories with marked results. “We knew the value of wearable technology when we first put it on the floor,” Gulick says. “In our first test in quality, our numbers were so high in the value it was adding that we actually retested and retested and retested. Some of the numbers we couldn't even publish because the leadership said they looked way too high.” I've been telling people for years that the killer app is boring business. Interesting that G has a new sales model: they make the hardware but sell to partners who will create specific applications and sell to customers.
  3. Game Writing: Narrative Skills for Videogames -- as VR and AR introduce branching stories into our lives (never mind Westworld-like hotels), the art of flexible narrative is a useful one to study. This is a review of a 2008 book with contributions from greats in the field. Of the books on professional games writing I’ve encountered, this is possibly the best, and definitely in the top three.
  4. Google Hire -- recruitment tool, but notable because they're integrating their enterprise tools into Gmail (note: you can't integrate your enterprise tool into Gmail). Most workflows touch email at some point, and that's the precise point where systems like Salesforce chafe. (I still twitch remembering the Chrome plugin for Gmail-Salesforce integration that we used at a previous startup.) Owning the mail client gives them huge opportunity here.

Continue reading Four short links: 19 July 2017.

Categories: Technology

Four short links: 18 July 2017

O'Reilly Radar - Tue, 2017/07/18 - 03:00

Fooling Image Recognition, Electronics Text, Zero-Knowledge Proofs, and Massively Parallel Protein Design

  1. Robust Adversarial Inputs -- tricking deep learning image recognition models. We’ve created images that reliably fool neural network classifiers when viewed from varied scales and perspectives.
  2. CircuitLab Textbook -- free introductory electronics textbook, work in progress.
  3. The Hunting of the Snark -- a treasure hunt consisting of cryptographic challenges that will guide you through a zero-knowledge proof (ZKP) learning experience. As a reminder, zero-knowledge proofs, invented decades ago, allow verifiers to validate a computation on private data by allowing a prover to generate a cryptographic proof that asserts to the correctness of the computed output.
  4. Massively Parallel Protein Design -- We combined computational protein design, next-generation gene synthesis, and a high-throughput protease susceptibility assay to measure folding and stability for more than 15,000 de novo designed miniproteins, 1,000 natural proteins, 10,000 point mutants, and 30,000 negative control sequences. This analysis identified more than 2,500 stable designed proteins in four basic folds—a number sufficient to enable us to systematically examine how sequence determines folding and stability in uncharted protein space. Clever approach to understanding protein folding. (via Ian Haydon)

Continue reading Four short links: 18 July 2017.

Categories: Technology

Textual entailment with TensorFlow

O'Reilly Radar - Mon, 2017/07/17 - 04:00

Using neural networks to explore natural language.

Textual entailment is a simple exercise in logic that attempts to discern whether one sentence can be inferred from another. A computer program that takes on the task of textual entailment attempts to categorize an ordered pair of sentences into one of three categories. The first category, called “positive entailment,” occurs when you can use the first sentence to prove that a second sentence is true. The second category, “negative entailment,” is the inverse of positive entailment. This occurs when the first sentence can be used to disprove the second sentence. Finally, if the two sentences have no correlation, they are considered to have a “neutral entailment.”

Textual entailment is useful as a component in much larger applications. For example, question-answering systems may use textual entailment to verify an answer from stored information. Textual entailment may also enhance document summarization by filtering out sentences that don’t include new information. Other natural language processing (NLP) systems find similar uses for entailment.

This article will guide you through how to build a simple and fast-to-train neural network to perform textual entailment using TensorFlow.

Before we get started

In addition to installing TensorFlow version 1.0, make sure you’ve installed each of the following:

To get a better sense of progress during network training, you're also welcome to install TQDM, but it's not required. Please access the code and Jupyter Notebook for this article on GitHub. We’ll be using Stanford’s SNLI data set for our training, but we’ll download and extract the data we need using code from the Jupyter Notebook, so you don’t need to download it manually. If this is your first time working with TensorFlow, I’d encourage you to check out Aaron Schumacher’s article, Hello, Tensorflow.”

We’ll start by doing all necessary imports, and we’ll let our Jupyter Notebook know it should display graphs and images in the notebook itself.

%matplotlib inline import tensorflow as tf import numpy as np import matplotlib.pyplot as plt import matplotlib.ticker as ticker import urllib import sys import os import zipfile

The files we're about to use may take five minutes or more to download, so if you're following along by running the program in the corresponding notebook, feel free to start running the next few cells. In the meantime, let’s explore textual entailment in further detail.

Examples of textual entailment

In this section, we’ll walk through a few examples of textual entailment to illustrate what we mean by positive, negative, and neutral entailment. To begin, we’ll look at positive entailment—when you read, for example, that “Maurita and Jade both were at the scene of the car crash,” you can infer that “Multiple people saw the accident.” In this example sentence pair, we can prove the second sentence (also known as a “hypothesis”) from the first sentence (also called the “text”), meaning that this represents a positive entailment. Given that Maurita and Jade were both there to view the crash, multiple people must have seen it. Note: “car crash” and “accident” have similar meanings, but they aren’t the same word. In fact, entailment doesn’t always mean that the sentences share words, as can be seen in this sentence pair, which only shares the word “the.”

Let’s consider another sentence pair. How, if at all, does the sentence “Two dogs played in the park with the old man” entail “There was only one canine in the park that day”? If there are two dogs, there must be at least two canines. Since the second sentence contradicts that idea, this is negative entailment.

Finally, to illustrate neutral entailment, we consider, how, if at all, the sentence “I played baseball with the kids” entails “The kids love ice cream.” Playing baseball and loving ice cream have absolutely nothing to do with each other. I could play baseball with ice cream lovers, and I could play baseball with ice cream haters (both are equally possible). Thus, the first sentence says nothing about the truth or falsehood of the second—implying neutral entailment.

Figure 1. Types of entailment. Credit: Steven Hewitt. Representing words as numbers using word vectorization

Unfortunately for neural networks, they primarily work with numeric values. To get around this, we need to represent our words as numbers in some way. Ideally, these numbers mean something; for example, we could use the character codes of the letters in a word, but that doesn’t tell us anything about the meaning of it (which would mean that TensorFlow would have to do a lot of work to tell that “dog” and “canine” are close to the same concept). Turning similar meanings into something a neural network can understand happens by a process called word vectorization.

One common way to create word vectorizations is to have each word represent a single point in a very high-dimensional space. Words with similar representations should be relatively close together in this space. For example, each color has a representation that is usually very similar to other colors; demonstrations of this can be found in the TensorFlow tutorial on word vectorization.

Working with Stanford’s GloVe word vectorization + SNLI data set

For our purposes, we won’t need to create a new representation of words as numbers. There already exist quite a few fantastic general-purpose vector representations of words as well as ways to train even more specialized material if the general-purpose data isn’t enough.

The associated notebook for this article is designed to work with the pre-trained data for Stanford’s GloVe word vectorization. We’ll be using the six-billion-token Wikipedia 2014 + Gigaword 5 vectors, since it’s the smallest and easiest to download. We’ll download the file programmatically, but keep in mind that it may take a while to run (it’s a fairly large file).

At the same time, we'll also be picking up our data set for textual entailment: Stanford's SNLI data set. We'll be using the development set in the interest of speed (it has only 10,000 sentence pairs), but if you're interested in getting better results and have time to spare for training, you can try using the full data set instead.

glove_zip_file = "" glove_vectors_file = "glove.6B.50d.txt" snli_zip_file = "" snli_dev_file = "snli_1.0_dev.txt" snli_full_dataset_file = "snli_1.0_train.txt" from six.moves.url.lib.request import urlretrieve #large file - 862 MB if (not os.path.isfile(glove_zip_file) and not os.path.isfile(glove_vectors_file)): urlretrieve ("", glove_zip_file) #medium-sized file - 94.6 MB if (not os.path.isfile(snli_zip_file) and not os.path.isfile(snli_dev_file)): urlretrieve ("", snli_zip_file) def unzip_single_file(zip_file_name, output_file_name): """ If the outFile is already created, don't recreate If the outFile does not exist, create it from the zipFile """ if not os.path.isfile(output_file_name): with open(output_file_name, 'wb') as out_file: with zipfile.ZipFile(zip_file_name) as zipped: for info in zipped.infolist(): if output_file_name in info.filename: with as requested_file: out_file.write( return unzip_single_file(glove_zip_file, glove_vectors_file) unzip_single_file(snli_zip_file, snli_dev_file) # unzip_single_file(snli_zip_file, snli_full_dataset_file)

Now that we have our GloVe vectors downloaded, we can load them into memory, deserializing the space separated format into a Python dictionary:

glove_wordmap = {} with open(glove_vectors_file, "r") as glove: for line in glove: name, vector = tuple(line.split(" ", 1)) glove_wordmap[name] = np.fromstring(vector, sep=" ")

Once we have our words, we need our input to contain entire sentences and process it through a neural network. Let's start with making the sequence:

def sentence2sequence(sentence): """ - Turns an input sentence into an (n,d) matrix, where n is the number of tokens in the sentence and d is the number of dimensions each word vector has. Tensorflow doesn't need to be used here, as simply turning the sentence into a sequence based off our mapping does not need the computational power that Tensorflow provides. Normal Python suffices for this task. """ tokens = sentence.lower().split(" ") rows = [] words = [] #Greedy search for tokens for token in tokens: i = len(token) while len(token) > 0 and i > 0: word = token[:i] if word in glove_wordmap: rows.append(glove_wordmap[word]) words.append(word) token = token[i:] i = len(token) else: i = i-1 return rows, words What the computer sees when it looks at a sentence

To better visualize the word vectorization process, and to see what the computer sees when it looks at a sentence, we can represent the vectors as images. Feel free to use the notebook to play around with visualizing your own sentences. Each row represents a single word, and the columns represent individual dimensions of the vectorized word. The vectorizations are trained in terms of relationships to other words, so what the representations actually mean is ambiguous. The computer can understand this vector language, and that’s the most important part to us. Generally speaking, two vectors that contain similar colors in the same positions represent words that are similar in meaning.

def visualize(sentence): rows, words = sentence2sequence(sentence) mat = np.vstack(rows) fig = plt.figure() ax = fig.add_subplot(111) shown = ax.matshow(mat, aspect="auto") ax.yaxis.set_major_locator(ticker.MultipleLocator(1)) fig.colorbar(shown) ax.set_yticklabels([""]+words) visualize("The quick brown fox jumped over the lazy dog.") visualize("The pretty flowers shone in the sunlight.") Figure 2. Sentences in vectorized form, visualized. Credit: Steven Hewitt.

Unlike images, sentences are inherently sequential and can’t be constrained by size, so instead of fully connected forward-feeding networks that take in one input value and simply run until it produces a single output, we need a new type of network. We need...recurrence.

Vanilla recurrent networks

Recurrent neural networks (RNNs) are a sequence-learning tool for neural networks. This type of neural network has only one layer’s worth of hidden inputs, which is re-used for each input from the sequence, along with a “memory” that’s passed ahead to the next input’s calculations. These are calculated using matrix multiplication, where the matrix indices are trained weights, just like they are in a fully connected layer.

The same calculations are repeated for each input in the sequence, meaning that a single “layer” of a recurrent neural network can be unrolled into many layers. In fact, there will be as many layers as there are inputs in the sequence. This allows the network to process a very complex sentence. TensorFlow includes its own implementation of a vanilla RNN cell, BasicRNNCell, which can be added to your TensorFlow graph as follows:

rnn_size = 64 rnn = tf.contrib.rnn.BasicRNNCell(rnn_size) The vanishing gradient problem

In theory, the network would be able to remember things from one of the first layers, much earlier in the sentence—even at the end of the sentence. The main problem with this form of recurrence is that, in practice, earlier data is completely drowned out by newer inputs and information that doesn’t end up being nearly as important. Recurrent neural networks, or at least a neural network with standard hidden units, often fail to hold on to information for long periods of time. This failure is known as the vanishing gradient problem.

The simplest way to visualize this is by example. In the simplest case, input and “memory” are roughly equally weighted. The first input into the data will affect approximately half of the first output (the other half being the starting “memory”), a quarter of the second output, then an eighth of the third output, and so on.

This means we can’t use vanilla recurrent networks, at least not if we want to keep track of both sentences in this pair. The solution is to use a different type of recurrent network layer. Perhaps the simplest of these is the long short-term memory layer, also known as an LSTM.

Utilizing LSTM

In an LSTM, instead of the input (xt) always being used the same way every time in the calculation of current memory, the network makes a decision on how much the current values can affect the memory by an “input gate” (it), and makes another decision on what memory (ct) is forgotten by an appropriately named “forget gate” (ft), and finally makes a third decision on what parts of memory are sent to the next timestep (ht) by an “output gate” (ot).

Figure 3. A diagram showing the hidden units within an LSTM layer. Credit: Steven Hewitt (adapted from this similar image, distributed under CC BY-SA 4.0).

The combination of these three gates creates a choice: a single LSTM node can either keep information in long-term memory or keep it in short-term memory, but it can’t do both at the same time. Short-term memory LSTMs usually train to have relatively open input gates that let a lot of information in and forget many things often, while long-term memory LSTMs have tight input gates that only allow very small, very specific pieces of information in. This tightness means that it doesn’t lose its information easily, allowing for longer retention time.

In general, LSTMs are very cryptic. Different LSTM nodes in the same network may have vastly different gates that rely upon one another, such as perhaps having a short-term gate remember the word “not” in the sentence “John did not go to the store,” so that when the word “go” appears, a long-term gate could remember “not go” instead of “go.” Of course, this is a contrived example, and, in practice, these relationships are very complex to the point of being indecipherable.

Defining the constants for our network

Since we aren’t going to use a vanilla RNN layer in our network, let's clear out the graph and add an LSTM layer, which TensorFlow also includes by default. Since this is going to be the first part of our actual network, let's also define all the constants we'll need for the network, which we'll talk about as they come up:

#Constants setup max_hypothesis_length, max_evidence_length = 30, 30 batch_size, vector_size, hidden_size = 128, 50, 64 lstm_size = hidden_size weight_decay = 0.0001 learning_rate = 1 input_p, output_p = 0.5, 0.5 training_iterations_count = 100000 display_step = 10 def score_setup(row): convert_dict = { 'entailment': 0, 'neutral': 1, 'contradiction': 2 } score = np.zeros((3,)) for x in range(1,6): tag = row["label"+str(x)] if tag in convert_dict: score[convert_dict[tag]] += 1 return score / (1.0*np.sum(score)) def fit_to_size(matrix, shape): res = np.zeros(shape) slices = [slice(0,min(dim,shape[e])) for e, dim in enumerate(matrix.shape)] res[slices] = matrix[slices] return res def split_data_into_scores(): import csv with open("snli_1.0_dev.txt","r") as data: train = csv.DictReader(data, delimiter='\t') evi_sentences = [] hyp_sentences = [] labels = [] scores = [] for row in train: hyp_sentences.append(np.vstack( sentence2sequence(row["sentence1"].lower())[0])) evi_sentences.append(np.vstack( sentence2sequence(row["sentence2"].lower())[0])) labels.append(row["gold_label"]) scores.append(score_setup(row)) hyp_sentences = np.stack([fit_to_size(x, (max_hypothesis_length, vector_size)) for x in hyp_sentences]) evi_sentences = np.stack([fit_to_size(x, (max_evidence_length, vector_size)) for x in evi_sentences]) return (hyp_sentences, evi_sentences), labels, np.array(scores) data_feature_list, correct_values, correct_scores = split_data_into_scores() l_h, l_e = max_hypothesis_length, max_evidence_length N, D, H = batch_size, vector_size, hidden_size l_seq = l_h + l_e

We'll also reset the graph to not include the RNN cell we added earlier, since we won't be using that for this network:


With both those out of the way, we can define our LSTM using TensorFlow as follows:

lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size) Implementing dropout, for regularization

If we simply used LSTM layers and nothing more, the network might read a lot of meaning into common, but inconsequential, words like “a,” “the,” and “and.” The network might incorrectly believe that it has found negative entailment if one sentence uses the phrase “an animal” and the other uses “the animal,” even if those phrases refer to the same object.

To solve this, we need to regulate to see if individual words end up being important to the meaning as a whole, and we do that by a process called “dropout.” Dropout is a regularization pattern in neural network design that revolves around dropping randomly selected hidden and visible units. As the size of a neural network increases, so does the number of parameters used to calculate the final result, each of which contributes to overfitting if trained all at once. In order to regularize for this, a portion of the units contained within the network are selected randomly and zeroed out temporarily during training, and their outputs are scaled appropriately during actual use.

Dropout on “standard” (i.e., fully connected) layers is also useful because it effectively trains multiple smaller networks, and then combines them during testing time. One of the constants in machine learning is that combining multiple models nearly always makes for a better method than a single model on its own, and dropout serves to turn a single neural network into multiple smaller neural networks that share some nodes, and thus some parameters, with the others.

A dropout layer has one hyperparameter known as p, which is simply the probability that each unit is kept in the network for that iteration of training. The units that are kept provide their outputs to the next layer, and the units that are not kept provide nothing. What follows is an example showing the difference between a fully connected network without dropout and a fully connected network with dropout during one iteration of training:

Figure 4. On the left: A normal fully connected network. On the right: The same network during training, with p = 0.5. Credit: Steven Hewitt. Tensorflow’s DropoutWrapper for recurrent layers

Unfortunately, dropout does not play particularly nicely with LSTM layers' internal gates. The loss of certain pieces of crucial memory means that complicated relationships required for first-order logic have a harder time forming with dropout, so for our LSTM layer, we’ll skip using dropout on internal gates, instead using it on everything else. Thankfully, this is the default implementation of Tensorflow’s DropoutWrapper for recurrent layers:

lstm_drop = tf.contrib.rnn.DropoutWrapper(lstm, input_p, output_p) Completing our model

With all the explanations out of the way, we can finish up our model. The first step is tokenizing and using our GloVe dictionary to turn the two input sentences into a single sequence of vectors. Since we can’t effectively use dropout on information that gets passed within an LSTM, we’ll use dropout on features from words and on final output instead—effectively using dropout on the first and last layers from the unrolled LSTM network portions.

You may notice that we use a bi-directional RNN, with two different LSTM units. This form of recurrent network runs both forward and backward through the input data, which allows the network to review both the hypothesis and the evidence both independently and in relation to each other.

The final output from the LSTMs will be passed into a set of fully connected layers, and then from that, we’ll get a single real-valued score that indicates how strong each of the kinds of entailment are, which we use to select our final result and determine our confidence in that result.

# N: The number of elements in each of our batches, # which we use to train subsets of data for efficiency's sake. # l_h: The maximum length of a hypothesis, or the second sentence. This is # used because training an RNN is extraordinarily difficult without # rolling it out to a fixed length. # l_e: The maximum length of evidence, the first sentence. This is used # because training an RNN is extraordinarily difficult without # rolling it out to a fixed length. # D: The size of our used GloVe or other vectors. hyp = tf.placeholder(tf.float32, [N, l_h, D], 'hypothesis') evi = tf.placeholder(tf.float32, [N, l_e, D], 'evidence') y = tf.placeholder(tf.float32, [N, 3], 'label') # hyp: Where the hypotheses will be stored during training. # evi: Where the evidences will be stored during training. # y: Where correct scores will be stored during training. # lstm_size: the size of the gates in the LSTM, # as in the first LSTM layer's initialization. lstm_back = tf.contrib.rnn.BasicLSTMCell(lstm_size) # lstm_back: The LSTM used for looking backwards # through the sentences, similar to lstm. # input_p: the probability that inputs to the LSTM will be retained at each # iteration of dropout. # output_p: the probability that outputs from the LSTM will be retained at # each iteration of dropout. lstm_drop_back = tf.contrib.rnn.DropoutWrapper(lstm_back, input_p, output_p) # lstm_drop_back: A dropout wrapper for lstm_back, like lstm_drop. fc_initializer = tf.random_normal_initializer(stddev=0.1) # fc_initializer: initial values for the fully connected layer's weights. # hidden_size: the size of the outputs from each lstm layer. # Multiplied by 2 to account for the two LSTMs. fc_weight = tf.get_variable('fc_weight', [2*hidden_size, 3], initializer = fc_initializer) # fc_weight: Storage for the fully connected layer's weights. fc_bias = tf.get_variable('bias', [3]) # fc_bias: Storage for the fully connected layer's bias. # tf.GraphKeys.REGULARIZATION_LOSSES: A key to a collection in the graph # designated for losses due to regularization. # In this case, this portion of loss is regularization on the weights # for the fully connected layer. tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, tf.nn.l2_loss(fc_weight)) x = tf.concat([hyp, evi], 1) # N, (Lh+Le), d # Permuting batch_size and n_steps x = tf.transpose(x, [1, 0, 2]) # (Le+Lh), N, d # Reshaping to (n_steps*batch_size, n_input) x = tf.reshape(x, [-1, vector_size]) # (Le+Lh)*N, d # Split to get a list of 'n_steps' tensors of shape (batch_size, n_input) x = tf.split(x, l_seq,) # x: the inputs to the bidirectional_rnn # tf.contrib.rnn.static_bidirectional_rnn: Runs the input through # two recurrent networks, one that runs the inputs forward and one # that runs the inputs in reversed order, combining the outputs. rnn_outputs, _, _ = tf.contrib.rnn.static_bidirectional_rnn(lstm, lstm_back, x, dtype=tf.float32) # rnn_outputs: the list of LSTM outputs, as a list. # What we want is the latest output, rnn_outputs[-1] classification_scores = tf.matmul(rnn_outputs[-1], fc_weight) + fc_bias # The scores are relative certainties for how likely the output matches # a certain entailment: # 0: Positive entailment # 1: Neutral entailment # 2: Negative entailment Showing TensorFlow how to calculate accuracy

In order to test the accuracy and begin to add in optimization constraints, we need to show TensorFlow how to calculate the accuracy—or the percentage of correctly predicted labels.

We also need to determine a loss, to show how poorly the network is doing. Since we have both classification scores and optimal scores, the choice here is using a variation on softmax loss from TensorFlow: tf.nn.softmax_cross_entropy_with_logits. We add in regularization losses to help with overfitting and then prepare an optimizer to learn how to reduce the loss.

with tf.variable_scope('Accuracy'): predicts = tf.cast(tf.argmax(classification_scores, 1), 'int32') y_label = tf.cast(tf.argmax(y, 1), 'int32') corrects = tf.equal(predicts, y_label) num_corrects = tf.reduce_sum(tf.cast(corrects, tf.float32)) accuracy = tf.reduce_mean(tf.cast(corrects, tf.float32)) with tf.variable_scope("loss"): cross_entropy = tf.nn.softmax_cross_entropy_with_logits( logits = classification_scores, labels = y) loss = tf.reduce_mean(cross_entropy) total_loss = loss + weight_decay * tf.add_n( tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)) optimizer = tf.train.GradientDescentOptimizer(learning_rate) opt_op = optimizer.minimize(total_loss) Let’s train the network

Finally, we can train the network! If you installed TQDM, you can use it to keep track of progress as the network trains.

# Initialize variables init = tf.global_variables_initializer() # Use TQDM if installed tqdm_installed = False try: from tqdm import tqdm tqdm_installed = True except: pass # Launch the Tensorflow session sess = tf.Session() # training_iterations_count: The number of data pieces to train on in total # batch_size: The number of data pieces per batch training_iterations = range(0,training_iterations_count,batch_size) if tqdm_installed: # Add a progress bar if TQDM is installed training_iterations = tqdm(training_iterations) for i in training_iterations: # Select indices for a random data subset batch = np.random.randint(data_feature_list[0].shape[0], size=batch_size) # Use the selected subset indices to initialize the graph's # placeholder values hyps, evis, ys = (data_feature_list[0][batch,:], data_feature_list[1][batch,:], correct_scores[batch]) # Run the optimization with these initialized values[opt_op], feed_dict={hyp: hyps, evi: evis, y: ys}) # display_step: how often the accuracy and loss should # be tested and displayed. if (i/batch_size) % display_step == 0: # Calculate batch accuracy acc =, feed_dict={hyp: hyps, evi: evis, y: ys}) # Calculate batch loss tmp_loss =, feed_dict={hyp: hyps, evi: evis, y: ys}) # Display results print("Iter " + str(i/batch_size) + ", Minibatch Loss= " + \ "{:.6f}".format(tmp_loss) + ", Training Accuracy= " + \ "{:.5f}".format(acc))

Your network is now trained! You should see accuracies around 50-55%, which can be improved by careful modification of hyperparameters and increasing the data set size to include the entire training set. Usually, this will correspond with an increase in training time.

Feel free to modify the following code in the notebook by inserting your own sentences:

evidences = ["Maurita and Jade both were at the scene of the car crash."] hypotheses = ["Multiple people saw the accident."] sentence1 = [fit_to_size(np.vstack(sentence2sequence(evidence)[0]), (30, 50)) for evidence in evidences] sentence2 = [fit_to_size(np.vstack(sentence2sequence(hypothesis)[0]), (30,50)) for hypothesis in hypotheses] prediction =, feed_dict={hyp: (sentence1 * N), evi: (sentence2 * N), y: [[0,0,0]]*N}) print(["Positive", "Neutral", "Negative"][np.argmax(prediction[0])]+ " entailment")

Finally, once we're done playing with our model, we'll close the session to free up system resources.

sess.close() Interested in developing more results?

The design focus of this network was creating a simple system that was easy and quick to train. In order to get more accurate results, you may want to consider:

  • Adding more layers of LSTMs.
  • Using alternative types of RNN layers, such as Gated Recurrent Units (GRUs). TensorFlow also includes an implementation of GRUs.
  • Adding more hidden units. If you do this, increase regularization and dropout strengths to account for the fact that there are more parameters in the network.
  • Experimentation with other kinds of networks entirely!

This post is a collaboration between O'Reilly and TensorFlow. See our statement of editorial independence.

Continue reading Textual entailment with TensorFlow.

Categories: Technology

Four short links: 17 July 2017

O'Reilly Radar - Mon, 2017/07/17 - 02:00

Discarded GPUs, Go REPL, Learning Point Clouds, and 3D-Printing Nanopatterns

  1. Used GPUs Flood the Market as Ethereum Price Drops Below 150 -- On second-hand sales websites like eBay and Gumtree, we have seen a lot of new GPU listings appear in recent days, with plenty of used AMD RX series GPUs appearing over the weekend. More hardware is expected to hit these sites over the coming days as some miners wind down their operations, though many will simply move to a more profitable currency or to invest their computing power into an emerging cryptocurrency that has the prospect of high values in the future. That said, one HN commenter points out that in many areas with cheap power, it's still profitable to mine.
  2. go-pry -- An interactive REPL for Go that allows you to drop into your code at any point.
  3. Representation Learning and Adversarial Generation of 3D Point Clouds -- The expressive power of our learned embedding, obtained without human supervision, enables basic shape editing applications via simple algebraic manipulations, such as semantic part editing and shape interpolation. Figure 4 is the wow shot: interpolating between different tables, lounges, and chairs. (via Gene Kogan)
  4. Programming 2D/3D Shape-shifting with Hobbyist 3D Printers -- Here we present initially flat constructs that, upon triggering by high temperatures, change their shape to a pre-programmed 3D shape, thereby enabling the combination of surface-related functionalities with complex 3D shapes. Origami-like magic lets you print precisely controlled bio-nanopatterns, printed electronic components, and sensors/actuators.

Continue reading Four short links: 17 July 2017.

Categories: Technology

Four short links: 14 July 2017

O'Reilly Radar - Fri, 2017/07/14 - 02:55

Molecular Sensing, Faking Speech, Radical Technologies, and Bullshit Detection

  1. Scio -- handheld molecular sensing for $300.
  2. AI Can Fake Speech (IEEE) -- The research team had a neural net analyze millions of frames of video to determine how elements of Obama's face moved as he talked, such as his lips and teeth and wrinkles around his mouth and chin. [...] In the new study, the neural net learned what mouth shapes were linked to various sounds. The researchers took audio clips and dubbed them over the original sound files of a video. They next took mouth shapes that matched the new audio clips and grafted and blended them onto the video. Essentially, the researchers synthesized videos where Obama lip-synched words he said up to decades beforehand.
  3. Radical Technologies: The Design of Everyday Life (Adam Greenfield) -- none of our instincts will guide us in our approach to the next normal. If we want to understand the radical technologies all around us, and see just how they interact to produce the condition we recognize as everyday life, we'll need a manual. That is the project of this book.
  4. Introductory Bullshit Detection for Non-Technical Managers -- “I’m creating a framework to...” It means: I’m not interested in solving the actual problem, so I’m going to create something else so that the person who actually will solve the problem has to also fix the problems in my stuff on top of that.

Continue reading Four short links: 14 July 2017.

Categories: Technology

Neuroevolution: A different kind of deep learning

O'Reilly Radar - Thu, 2017/07/13 - 08:30

The quest to evolve neural networks through evolutionary algorithms.

Neuroevolution is making a comeback. Prominent artificial intelligence labs and researchers are experimenting with it, a string of new successes have bolstered enthusiasm, and new opportunities for impact in deep learning are emerging. Maybe you haven’t heard of neuroevolution in the midst of all the excitement over deep learning, but it’s been lurking just below the surface, the subject of study for a small, enthusiastic research community for decades. And it’s starting to gain more attention as people recognize its potential.

Put simply, neuroevolution is a subfield within artificial intelligence (AI) and machine learning (ML) that consists of trying to trigger an evolutionary process similar to the one that produced our brains, except inside a computer. In other words, neuroevolution seeks to develop the means of evolving neural networks through evolutionary algorithms.

When I first waded into AI research in the late 1990s, the idea that brains could be evolved inside computers resonated with my sense of adventure. At that time, it was an unusual, even obscure field, but I felt a deep curiosity and affinity. The result has been 20 years of my life thinking about this subject, and a slew of algorithms developed with outstanding colleagues over the years, such as NEAT, HyperNEAT, and novelty search. In this article, I hope to convey some of the excitement of neuroevolution as well as provide insight into its issues, but without the opaque technical jargon of scientific articles. I have also taken, in part, an autobiographical perspective, reflecting my own deep involvement within the field. I hope my story provides a window for a wider audience into the quest to evolve brains within computers.

The success of deep learning

If you've been following AI or ML recently, you've probably heard about deep learning. Thanks to deep learning, computers can accomplish tasks like recognizing images and controlling autonomous vehicles (or even video game characters) at close to or sometimes surpassing human performance. These achievements have helped deep learning and AI in general to emerge from the obscurity of academic journals into the popular press and news media, inspiring the public imagination. So, what is actually behind deep learning that has enabled its success?

In fact, underneath the hood in deep learning is the latest form of a decades-old technology called artificial neural networks (ANNs). Like many ideas in AI, ANNs are roughly inspired by biology; in this case, by the structure of the brain. We choose the brain as an inspiration for AI because the brain is the unequivocal seat of intelligence; while we're pursuing AI, it makes sense that, at some level, it should resemble the brain. And one of the key building blocks of brains is the neuron, a tiny cell that sends signals to other neurons over connections. When many neurons are connected to each other in a network (as happens in brains), we call that a neural network. So, an ANN is an attempt to simulate a collection of neuron-like components that send signals to each other. That's the underlying mechanism behind the "deep networks" in deep learning.

Researchers in ANNs write a program that simulates these neurons and the signals that travel between them, yielding a process vaguely reminiscent of what happens in brains. Of course, there are also many differences. The challenge is that simply connecting a bunch of neuron-like elements to each other and letting them share signals does not yield intelligence. Intelligence, instead, arises from precisely how the neurons are connected.

For example, a neuron that strongly influences another neuron is said to connect to its partner with a strong weight. In this way, the weights of connections determine how neurons influence each other, yielding a pattern of neural activation across a neural network in response to inputs to the network (which could be, for example, from the eyes). To get an intelligent network, the consequent challenge is to decide what these connection weights should be.

In general, almost no one tries to decide connection weights by hand. (With millions of connections in modern ANNs, you can imagine why that is not a viable approach.) Instead, the problem of finding the right weights to perform a task is viewed as the problem of learning. In other words, researchers have invested a lot of effort in devising methods to allow ANNs to learn the best weights for a task on their own. The most common approach to learning weights is to compare the output of an ANN (e.g., "That looks like a dog") to the ground truth (it's actually a cat) and then shift the weights through a principled mathematical formula to make the correct output more likely in the future.

After numerous such training examples (maybe millions), the network starts to assign the right weights to answer all kinds of questions correctly. Often, it can even generalize to questions it has never seen, as long as they are not too different from those it saw in training. At this point, the ANN has basically learned to perform the desired task. A common approach to such weight shifting is called stochastic gradient descent, which is the aforementioned formula popular throughout deep learning. The realization in deep learning in recent years is that it's possible to train massive ANNs with many layers of neurons (which is why they're called "deep") through this approach thanks to the availability of powerful modern computer hardware.

But there's an issue we haven't yet addressed: how do you determine what connects to what in the first place? In other words, the behavior of your brain is not only determined by the weights of its connections, but by the overall architecture of the brain itself. Stochastic gradient descent does not even attempt to address this question. Rather, it simply does its best with the connections it is provided.

So, where did these connections come from? In deep learning, the answer is that they generally come from a human researcher who decides, based on some level of experience, what the architecture should be. In contrast, the answer for natural brains is evolution. The 100-trillion-connection architecture of our human brain evolved through a Darwinian process over many millions of years.

The architecture of our brains is exceedingly powerful. After all, human intelligence is literally seated there, which means, in effect, that the evolution of brains in nature is the only example of any known process in the universe actually producing something strongly intelligent. The goal of neuroevolution is to trigger a similar evolutionary process inside a computer. In this way, neuroevolution is the only branch of AI with an actual proof of concept: brains did evolve, so we know that's one way to produce intelligence.

To be clear, deep learning traditionally focuses on programming an ANN to learn, while the concern in neuroevolution focuses on the origin of the architecture of the brain itself, which may encompass what is connected to what, the weights of those connections, and (sometimes) how those connections are allowed to change. There is, of course, some overlap between the two fields—an ANN still needs connection weights suited to its task, whether evolved or not, and it's possible that evolved ANNs might leverage the methods used in deep learning (for instance, stochastic gradient descent) to obtain those weights. In fact, deep learning might even be viewed as a sibling of neuroevolution that studies how weights are learned within either an evolved or preconceived architecture.

However, it's also conceivable that the mechanism of learning itself could be evolved, potentially transcending or elaborating the conventional techniques of deep learning as well. In short, the brain—including its architecture and how it learns—is a product of natural evolution, and neuroevolution can probe all the factors that contribute to its emergence, or borrow some from deep learning and let evolution determine the rest.

How it works

Now that we have a sense of what neuroevolution is about, we can talk about how it works. The first neuroevolution algorithms appeared in the 1980s. At the time, its small group of practitioners thought it might be an alternative to the more conventional ANN training algorithm called backpropagation (a form of stochastic gradient descent). In these early systems, neuroevolution researchers would (as in deep learning today) decide on the neural architecture themselves—which neurons connect to which—and simply allow evolution to decide the weights instead of using stochastic gradient descent. Because the architecture could not be changed by evolution, this approach came to be known as fixed-topology neuroevolution.

These systems are a bit different from nature in that the genes of the evolving ANNs in fixed-topology neuroevolution literally encode their weights, which are frozen from birth. In this way, the ANNs are "born" knowing everything they will ever know and cannot learn anything further during their "lifetime." This scenario may be a little confusing because we generally consider learning something we do during our lifetime, but if you think about it, the breeding happening in these systems in effect is the learning. That is, when parents produce children better adapted to a task, a kind of “learning” over generations is happening.

So, how would a system like that really be set up? How do you evolve an artificial brain to solve a problem? In fact, it's a lot like animal breeding. Suppose you want to evolve a neural network to control a robot to walk. In this kind of task, we would typically have on hand a simulator because neuroevolution takes a lot of trials, which are much faster and less risky to run in simulation. So, we'll start with a robot body in a physics simulator.

Now we need some ANNs to get everything started. At the beginning, we don't know how to solve the task, so we just generate a population (say 100) of random ANNs. In the case of fixed-topology ANNs, the weights of the predetermined architecture would be randomized in each of the 100 individuals in the population. Now, we just need to perform selection, which means breeding the better candidates to produce offspring.

To evaluate our current population, we first take an ANN from the population and, in effect, hand over to it control of the simulated robot's body. We let the ANN tell the body how to move, which is called the output of the network. The ANN might also receive receive input from the body, such as a sense of when each foot hits the ground. Then the computer just watches and sees what the ANN does when it's in control. Every ANN in the population is tested in this way and given a score called its “fitness,” based on the quality of its performance.

It's pretty clear that the randomly generated networks in the initial population are unlikely to do well. They're more likely to flail around than anything else. (After all, their brains are random.) But that's okay, because the key is not that one ANN is particularly good, but rather that some are better than others, even if just by a little bit. Perhaps one network manages to get the robot to limp a little farther than another. The engine that drives improvement in neuroevolution is that those who are slightly better will be selected as parents of the next generation. The algorithm will construct their offspring by slightly altering their ANNs (such as by slightly changing the weights of their connections). While some offspring are worse than their parents, some will function slightly better (say by flailing a bit less), and those will then become parents of the next generation, and so on. In this way, the overall approach is to keep selecting increasingly fit individuals as parents. In effect, the process of neuroevolution is a kind of automated breeding farm for ANNs, where the computer selects the parents to breed based on their fitness.

The core idea of neuroevolution, then, is simple: it's essentially just breeding. But beyond that, things get a lot more interesting. Over the decades since the first fixed-topology neuroevolution algorithms began to appear, researchers have continually run into the frustrating reality that even as the algorithms create new possibilities, the brains they can evolve remain far from what evolved in nature. There are many reasons for this gap, but a fascinating aspect of the field is that every so often a surprising new insight into the workings of natural evolution emerges, resulting in a leap in the capability of neuroevolution algorithms. Often, these insights are counter-intuitive, overturning previous assumptions and highlighting the mysteriousness of nature. As we gradually chip away at these mysteries, we discover how to fashion increasingly powerful algorithms for evolving brains.

Increasing complexity

What does it mean to make progress in neuroevolution? In general, it involves recognizing a limitation on the complexity of the ANNs that can evolve and then introducing an approach to overcoming that limitation. For example, the fixed-topology algorithms of the '80s and '90s exhibit one glaring limitation that clearly diverges from nature: the ANNs they evolve can never become larger. In contrast, brains in nature have increased in size and complexity in many lineages, and our distant ancestors had orders of magnitude fewer neurons in their cranium than the 100 billion neurons (and 100 trillion connections) in ours. Clearly, if the topology of the ANN is fixed from the start, no such complexity can ever evolve.

Consequently, researchers began experimenting with topology and weight evolving ANNs (TWEANNs). In this more flexible variant, the architecture (topology) of a parent ANN can be slightly changed in its offspring, such as by adding a new connection or a new neuron. While the idea is relatively simple, the implications are significant because it means that the brain can evolve to become larger. A variety of TWEANNs began to appear in the 90s, though the problems they tackled remained relatively simple, such as learning to give the right answer in simple mathematical and control problems. But the excitement of this era was not necessarily for the problems themselves but rather the vast untapped potential of artificially evolving the architecture and weights of a brain-like structure. The limitations of such a system were then unknown, and anything seemed possible.

To give you a sense of the problems these ANNs (both fixed-topology and TWEANNs) were being evolved to solve during this era, one popular benchmark problem was called pole balancing. The problem is to control a simulated cart that can only move in two directions with a pole affixed to its top by a hinge. The longer the cart (controlled by an ANN) can keep the pole in the air, the higher its fitness. This problem is a lot like trying to balance a pencil on the palm of your hand—it requires tight coordination and quick reactions. As learning methods improved, ANNs could solve increasingly difficult versions of pole balancing, such as balancing two poles at once. The succession of such tasks served to mark progress in the field, and researchers could promote their algorithms as the first to solve one variant or another.

If you're interested in the technical details of the algorithms of this era, Xin Yao published an excellent 1999 survey of many of its neuroevolution algorithms. Much of the pioneering work of this period came from Stefano Nolfi and Dario Floreano, who captured many of its best ideas in their 2000 classic book, Evolutionary Robotics. Another book that helped to popularize the idea of evolving neural networks around the turn of the century is David Fogel’s classic, Blondie24: Playing at the Edge of AI (from Morgan Kaufmann Publishers), which told the story of a neuroevolution algorithm that learns to play master-level checkers through games with real humans.

In 1999, I started to think seriously about neuroevolution as an early Ph.D. student at the University of Texas at Austin in the research group of Professor Risto Miikkulainen, where several important fixed-topology algorithms had already been invented (such as the SANE and ESP algorithms from David Moriarty and Faustino Gomez, both in collaboration with Risto Miikkulainen). These algorithms were already quite clever, investigating the idea that the individual neurons of a network might be evolved in their own subpopulations to cooperate with other neurons, and then grouped with neurons from other subpopulations to form full working networks, an idea generally known as “cooperative coevolution.”

Arriving on the scene in the midst of these developments, I was enchanted with the idea that the evolution of brains could happen inside a computer. I appreciated the importance of evolving ANNs to solve problems, but my real passion lay with the idea of increasing complexity over evolution. I wanted to build an algorithm that would enable ANNs to explode in complexity inside the computer, echoing the evolution of brains in nature. For some reason, even though existing TWEANN algorithms could change the architecture of evolving ANNs over time, they seemed to lack the strong propensity toward increasing complexity that I hoped to capture.

I spent a lot of time studying existing TWEANN algorithms and thinking about why they might not be working as well as they could. Some problems were well known, such as the competing conventions problem, which refers to the fact that it is hard to combine two parent ANNs to create an offspring (an operation called “crossover”). More specifically, different networks might express the same functionality with different combinations of connections weights, making it hard to know how to combine them as parents.

Other problems I soon confronted were not yet recognized, such as the tendency for new architectures to go extinct from an evolving population before they could realize their potential. The problem in this case is that the initial impact of changing architecture from one generation to the next is usually negative, even if future evolution might eventually take advantage of the new weights (by optimizing them further) given sufficient generations.

The result of my time seeking a better TWEANN algorithm (in collaboration with my Ph.D. advisor Risto Miikkulainen) was an algorithm called NeuroEvolution of Augmenting Topologies, or NEAT, which quickly became the most popular and widely used algorithm in neuroevolution. NEAT's appeal is in its ability to evolve increasingly complex ANNs through a series of innovations that sidestep the typical TWEANN challenges. While earlier algorithms (like Inman Harvey’s SAGA algorithm) provided hints that complexity can evolve, the appeal of NEAT is in its explicit solutions to the various problems of neuroevolution of its day. For example, NEAT marks genes with something called a “historical marking” to ensure a coherent result of crossover. It also implements a particular kind of speciation calibrated for TWEANNs that allows innovative new structures more breathing room to optimize before being prematurely eliminated from the population.

NEAT obtained record pole-balancing benchmark results, but it was the potential to start simply and then increase complexity over generations, the trademark behavior of NEAT, that inspired my imagination more than benchmark results. In fact, although it was invented before the advent of the term “deep learning,” NEAT has the intriguing capability to evolve increasingly deep networks.

NEAT has been applied by myself and others to innumerable applications, from controlling robots to controlling video game agents. It has also found a niche within computational creativity, where (using examples from my own lab at UCF) it evolves networks that output art (such as Picbreeder) and music (e.g., MaestroGenesis ). Its creative applications also extend to video games such as Galactic Arms Race, where it provides a force for generating new content.

Among its most significant impacts, NEAT was the optimization method used at the Tevatron particle accelerator to find the the most accurate mass estimate of the top quark to date (see here for the main result and here for more details on the application of NEAT). A video by YouTube star Sethbling recently helped to educate over four million viewers on the inner workings of NEAT through evolving a controller for Mario in Super Mario Bros.

Indirect encoding

While it might sound like NEAT is the final word on neuroevolution, a captivating aspect of research in this field is that major advances often reveal limitations and flaws that were not previously clear. These revelations, in turn, produce new questions and new generations of algorithms, and the cycle continues. In effect, over time we are uncovering something approximating nature's secrets—the secrets to how brains evolved—which are endlessly surprising.

For example, one limitation of NEAT that increasingly occupied my thinking is that it uses a kind of "artificial DNA" called direct encoding, which means that every connection in the ANN is described in the genome by a single corresponding gene. While that works out fine when you have several hundred connections or so, it starts to get pretty unwieldy if you're aiming for larger brains. As an example, the 100 trillion connections in the human brain would require 100 trillion genes to encode in NEAT. In other words, while NEAT can indeed expand the size of the ANNs it evolves by adding connections and neurons through mutations, adding 100 trillion connections would take on the order of 100 trillion mutations. Even if somehow it managed to evolve a new connection every second (which means a new generation would need to be born every second), the sun would be long extinguished by the time the size of brains reached 100 trillion. So, NEAT must still be missing some property present in nature that makes it possible to evolve such massive structures.

In response to this challenge, neuroevolution researchers have explored a different class of genetic encodings called indirect encodings (as opposed to direct), where the number of genes can be much fewer than the number of connections and neurons in the brain. In other words, the "DNA" is a compressed representation of the brain. Pioneering work in indirect encodings (both in encodings ANNs and physical forms) by researchers such as Josh Bongard, Greg Hornby, and Jordan Pollack helped to highlight the power of indirect encodings to evolve much more impressive and natural-looking structures. These include evolved artificial creatures that exhibit regularities reminiscent of natural organisms (some of Bongard’s are shown at here) as well as a remarkable set of evolved tables (the kind with elevated, flat surfaces) with interesting structural patterns from Hornby and Pollack at Brandeis (see them at here). Among the most eye-catching demonstrations of indirect encoding (from before they caught on in neuroevolution) are the virtual creatures of Karl Sims, whose video from over 20 years ago remains captivating and unforgettable.

Indirect encodings within neuroevolution (focusing now on encoding ANNs in particular) have made it possible to evolve much larger ANNs than with direct encodings like NEAT. One of the most popular such indirect encodings is called compositional pattern-producing networks (CPPNs), which was invented in my lab, the Evolutionary Complexity Research Group at the University of Central Florida, in response to the limitations of NEAT as a direct encoding. A CPPN is basically a way of compressing a pattern with regularities and symmetries into a relatively small set of genes. This idea makes sense because natural brains exhibit numerous regular patterns (i.e., repeated motifs) such as in the receptive fields in the visual cortex. CPPNs can encode similar kinds of connectivity patterns.

When CPPNs are used to generate the connectivity patterns of evolving ANNs, the resulting algorithm, also from my lab, is called HyperNEAT (Hypercube-based NEAT, co-invented with David D’Ambrosio and Jason Gauci) because under one mathematical interpretation, the CPPN can be conceived as painting the inside of a hypercube that represents the connectivity of an ANN. Through this technique, we began to evolve ANNs with hundreds of thousands to millions of connections. Indirectly encoded ANNs have proven useful, in particular, for evolving robot gaits because their regular connectivity patterns tend to support the regularity of motions involved in walking or running. Researchers like Jeff Clune have helped to highlight the advantages of CPPNs and HyperNEAT through rigorous studies of their various properties. Other labs also explored different indirect encoding in neuroevolution, such as the compressed networks of Jan Koutník, Giuseppe Cuccu, Jürgen Schmidhuber, and Faustino Gomez.

Novelty search

Now you can begin to see a trajectory of ideas. The field started with fixed-topology networks, moved to complexifying networks, and then began to focus on indirectly encoded networks. These shifts in perspective and capabilities will continue as we gain more insight into the evolution of complexity. One other recent influential idea in the field, also from our lab (and echoed in seminal studies by Jean-Baptiste Mouret and Stéphane Doncieux), is the idea that parents should not always be selected based on their objective performance, but rather based on their novelty. In other words, perhaps surprisingly, sometimes it is actually detrimental to the long-term success of evolution to select which parents will reproduce based on fitness. For example, consider trying to evolve an ANN for a walking robot. You might think that to get the best walker, the current best walkers should be the parents of the next generation. However, that turns out not to be the case.

In fact, early in evolving the ability to walk, the best ANNs for controlling walkers may rely merely upon lurching forward unreliably. Perhaps other candidates do something interesting like oscillating the robot's legs in a regular pattern, but suppose these interesting robots simply fall down—if fitness is our guide, we would prefer the ones who manage to travel farther and thereby drop the oscillators from further consideration. But here's the problem: it could be that the more interesting concept (oscillating) is more likely to lead to robust walking than the currently best strategy (lurching). Yet fitness selects against oscillation simply because at the current stage it isn't so great. In other words, the stepping stones to the best walker may be deceptive—in which case, simply choosing from among the best is likely to move to a dead end.

This insight (among other considerations) led me to the idea for the novelty search algorithm (co-invented with my then-Ph.D. student Joel Lehman). Novelty search is a selection scheme based on the idea that you choose the most novel as parents instead of choosing the most fit.

This idea may sound strange at first, so let's back up for a moment. In evolution, there are two primary factors that impact its performance—one is the representation or encoding, which in nature is DNA. The work on increasing complexity and indirect encoding falls into this category. The other big factor is how the system determines who will reproduce, which is called selection. Almost all the effort on selection over the history of evolutionary computation has focused on selecting parents based on their fitness. That doesn't mean we necessarily always or only choose the most fit, because that would quickly drain diversity from the population, but it does mean that the more fit tend to have a higher chance of reproducing than the less fit.

Novelty search turns this fitness-driven paradigm on its head, dropping the usual notion of fitness (i.e., quality of behavior) entirely from its selection scheme. Instead, now, the computer looks at the behavior of an individual candidate (say a walking robot) and compares it to behaviors seen in prior generations. In novelty search, if the new candidate's behavior is significantly different from what has come before, it has a high chance of being selected to reproduce, regardless of how fit it is.

This idea stirred up some controversy because people like the idea of selecting based on their criteria for success, which is captured by fitness, but it was hard to ignore that novelty sometimes works. That is, sometimes you could actually select for novelty and get a better solution in faster time than you would have if you selected for fitness. Consider the example of the walking robot. Unlike fitness-based evolution, novelty search is likely to select the oscillator that falls down as a parent simply because it is doing something new, while fitness-based search would almost certainly ignore it for falling over so soon. But as we discussed, it turns out that oscillation is an essential stepping stone to robust walking. So, counterintuitively (because of deception), it can actually be better to ignore your objective and select simply for novelty rather than for improving objective performance.

This insight shifted how we think about neuroevolution once again, and has led to a whole new research area called “quality diversity” and sometimes “illumination algorithms.” This new class of algorithms, generally derived from novelty search, aims not to find a single optimal solution but rather to illuminate a broad cross-section of all the high-quality variations of what is possible for a task, like all the gaits that can be effective for a quadruped robot. One such algorithm, called MAP-Elites (invented by Jean-Baptiste Mouret and Jeff Clune), landed on the cover of Nature recently (in an article by Antione Cully, Jeff Clune, Danesh Tarapore, and Jean-Baptiste Mouret) for the discovery of just such a large collection of robot gaits, which can be selectively called into action in the event the robot experiences damage.

If you think about it, novelty is not so much about solving a particular problem as it is about finding all the interesting possibilities that exist within the search space. The fact that it still sometimes gives you a state-of-the-art solution is just an intriguing side effect of this broader process. Quality diversity algorithms (the next step beyond novelty search) add the notion of fitness back in, but in a careful way so as not to undermine the delicate push for novelty. The result is a class of algorithms that diverge through the search space, while still pulling out the best of what is possible.

This new direction is important for neuroevolution, but it's interesting to note that it goes beyond neuroevolution, machine learning, or even computer science. Consider: the principle that sometimes the best way to achieve something noteworthy is to avoid explicitly trying to achieve it is actually so general that it even can apply to life and society. For example, the best way to raise student achievement may not be through delivering a continual stream of tests. The fact that this insight from neuroevolution has such potentially broad implications is no accident—it is natural (and exciting) that a concerted attempt to improve the ability of algorithms to explore the space of the possible would lead to insights about innovation and achievement in general.

In other words, as we crack the puzzle of neuroevolution, we are learning not just about computer algorithms, but about how the world works in deep and fundamental ways. That's one reason that myself and my co-author Joel Lehman wrote the book, Why Greatness Cannot Be Planned: The Myth of the Objective. We wanted to share some of the broader implications of developments in our field with the larger public. We believe the implications are indeed important well outside computer science, for institutions focused on innovation and creativity.

The comeback

I began this article with the claim that neuroevolution is back, but what does that actually mean? In reality, the field never really went away, and many of its top researchers have remained in the trenches all along. However, the ascent of deep learning has certainly refocused a lot of the attention that might have gone to neuroevolution in the past. Now that’s starting to change.

One of the main reasons for deep learning’s success is the recent increase in available hardware processing power. ANNs turn out to benefit from parallel processing (e.g., in GPUs), whose availability has exploded over the last few years. While many have viewed this marriage between ANNs and powerful new hardware in deep learning as a unique and serendipitous union, it is not lost on some that neuroevolution may be on the cusp of a similar story. That is, neuroevolution is just as eligible to benefit from massive hardware investment as conventional deep learning, if not more. The advantage for neuroevolution, as with all evolutionary algorithms, is that a population of ANNs is intrinsically and easily processed in parallel—if you have 100 ANNs in the population and 100 processors, you can evaluate all of those networks at the same time, in the time it takes to evaluate a single network. That kind of speed-up can radically expand the potential applications of the method.

One consequence is that labs with access to large-scale computing clusters can see that they might be sitting on a neuroevolution goldmine, prompting a new generation of researchers and next-generation neuroevolution experiments to grow out of labs largely otherwise invested in conventional deep learning. There’s also an underlying worry about potentially missing the next big thing.

For example, Google Brain (an AI lab within Google) has published large-scale experiments encompassing hundreds of GPUs on attempts to evolve the architecture of deep networks. The idea is that neuroevolution might be able to evolve the best structure for a network intended for training with stochastic gradient descent. In fact, the idea of architecture search through neuroevolution is attracting a number of major players in 2016 and 2017, including (in addition to Google) Sentient Technologies, MIT Media Lab, Johns Hopkins, Carnegie Mellon, and the list keeps growing. (See here and here for examples of initial work from this area.)

Another area where neuroevolution is making a comeback is in reinforcement learning, which focuses on control and decision-making problems (like playing Atari games or controlling a biped robot). A team at OpenAI recently reported an experiment where a variant of neuroevolution matches the performance of more conventional deep-learning-based reinforcement learning on a slew of benchmark tasks, including Atari games originally conquered in widely publicized work by Google’s DeepMind AI think tank in deep learning.

The field of neuroevolution also continues to move along its own unique paths. For example, much research over the years and continuing to this day has focused on how to evolve plastic ANNs, which means networks with connection weights that change over the lifetime of the network. That is, evolution in this case is not just deciding the architecture and weights, but also the rules that guide how and when particular weights change. That way, the resultant networks are, in principle, more like biological brains, which change over their lifetimes in reaction to their experience. Much of my own thinking on the topic of plastic neural networks is influenced by the early works of Dario Floreano and later ideas on neuromodulation, which allows some neurons to modulate the plasticity of others, from Andrea Soltoggio.

Another interesting topic (and a favorite of mine) well suited to neuroevolution is open-endedness, or the idea of evolving increasingly complex and interesting behaviors without end. Many regard evolution on Earth as open-ended, and the prospect of a similar phenomenon occurring on a computer offers its own unique inspiration. One of the great challenges for neuroevolution is to provoke a succession of increasingly complex brains to evolve through a genuinely open-ended process. A vigorous and growing research community is pushing the boundaries of open-ended algorithms, as described here. My feeling is that open-endedness should be regarded as one of the great challenges of computer science, right alongside AI.

Video games also remain a popular application, not just for controlling characters in games, but for evolving new content. If you consider all the prolific variety of life on Earth, you can see how evolutionary algorithms could be a natural conduit for generating diverse new content. Georgios Yannakakis at the University of Malta and Julian Togelius at New York University, both pioneers in the field themselves, present a broad overview of many applications of neuroevolution (and other AI algorithms) to gaming in their new book, Artificial Intelligence and Games.

Getting involved

If you’re interested in evolving neural networks yourself, the good news is that it’s relatively easy to get started with neuroevolution. Plenty of software is available (see here), and for many people, the basic concept of breeding is intuitive enough to grasp the main ideas without advanced expertise. In fact, neuroevolution has the distinction of many hobbyists running successful experiments from their home computers, as you can see if you search for “neuroevolution” or “NEAT neural” on YouTube. As another example, one of the most popular and elegant software packages for NEAT, called SharpNEAT, was written by Colin Green, an independent software engineer with no official academic affiliation or training in the field.

Considering that evolution is indeed the only genuine means we know by which intelligence at the human level has ever been produced, the potential in the field of neuroevolution is great for further advances and new insights with the increasing computational power available today.

Continue reading Neuroevolution: A different kind of deep learning.

Categories: Technology

Building a simple GraphQL server with Neo4j

O'Reilly Radar - Thu, 2017/07/13 - 08:30

How to implement a GraphQL API that queries Neo4j for a simple movie app.

GraphQL is a powerful new tool for building APIs that allows clients to ask for only the data they need. Originally designed at Facebook to minimize data sent over the wire and reduce round-trip API requests for rendering views in native mobile apps, GraphQL has since been open sourced to a healthy community that is building developer tools. There are also a number of large companies and startups such as GitHub, Yelp, Coursera, Shopify, and Mattermark building public and internal GraphQL APIs.

Despite what the name seems to imply, GraphQL is not a query language for graph databases, it is instead an API query language and runtime for building APIs. The “Graph” component of the name comes from the graph data model that GraphQL uses in the frontend. GraphQL itself is simply a specification, and there are many great tools available for building GraphQL APIs in almost every language. In this post we'll make use of graphql-tools by Apollo to build a simple GraphQL API in JavaScript that queries a Neo4j graph database for movies and movie recommendations. We will follow a recipe approach: first, exploring the problem in more detail, then developing our solution, and finally we discuss our approach. Good resources for learning more about GraphQL are and the Apollo Dev Blog.

Continue reading Building a simple GraphQL server with Neo4j.

Categories: Technology

Cheryl Platz on designing the Amazon Echo Look

O'Reilly Radar - Thu, 2017/07/13 - 07:00

The O'Reilly Design Podcast: Designing in secret, designing for voice, and why improv is an essential design skill.

In this week’s Design Podcast, I sit down with Cheryl Platz, senior designer at Microsoft for the Azure Portal and Marketplaces. We talk about the challenges of working on a top-secret design project, the research behind Amazon's Echo Look, the skills you need to start designing for voice, and how studying improv can make you a better designer.

Continue reading Cheryl Platz on designing the Amazon Echo Look.

Categories: Technology

Aaron Maxwell on the power of Python

O'Reilly Radar - Thu, 2017/07/13 - 04:20

The O’Reilly Programming Podcast: Using Python decorators, generators, and functions.

In this episode of the O’Reilly Programming Podcast, I talk all things Python with Aaron Maxwell, presenter of the live online training courses Python: Beyond The Basics, and Python: The Next Level. He is also the author of the book Powerful Python: The Most Impactful Patterns, Features and Development Strategies Modern Python Provides.

Continue reading Aaron Maxwell on the power of Python.

Categories: Technology

Introduction to reinforcement learning and OpenAI Gym

O'Reilly Radar - Thu, 2017/07/13 - 04:00

A demonstration of basic reinforcement learning problems.

Those interested in the world of machine learning are aware of the capabilities of reinforcement-learning-based AI. The past few years have seen many breakthroughs using reinforcement learning (RL). The company DeepMind combined deep learning with reinforcement learning to achieve above-human results on a multitude of Atari games and, in March 2016, defeated Go champion Le Sedol four games to one. Though RL is currently excelling in many game environments, it is a novel way to solve problems that require optimal decisions and efficiency, and will surely play a part in machine intelligence to come.

OpenAI was founded in late 2015 as a non-profit with a mission to “build safe artificial general intelligence (AGI) and ensure AGI's benefits are as widely and evenly distributed as possible.” In addition to exploring many issues regarding AGI, one major contribution that OpenAI made to the machine learning world was developing both the Gym and Universe software platforms.

Gym is a collection of environments/problems designed for testing and developing reinforcement learning algorithms—it saves the user from having to create complicated environments. Gym is written in Python, and there are multiple environments such as robot simulations or Atari games. There is also an online leaderboard for people to compare results and code.

Reinforcement learning, explained simply, is a computational approach where an agent interacts with an environment by taking actions in which it tries to maximize an accumulated reward. Here is a simple graph, which I will be referring to often:

Figure 1. Reinforcement Learning: An Introduction 2nd Edition, Richard S. Sutton and Andrew G. Barto, used with permission.

An agent in a current state (St) takes an action (At) to which the environment reacts and responds, returning a new state(St+1) and reward (Rt+1) to the agent. Given the updated state and reward, the agent chooses the next action, and the loop repeats until an environment is solved or terminated.

OpenAI’s Gym is based upon these fundamentals, so let’s install Gym and see how it relates to this loop. We’ll get started by installing Gym using Python and the Ubuntu terminal. (You can also use Mac following the instructions on Gym’s GitHub.)

sudo apt-get install -y python3-numpy python3-dev python3-pip cmake zlib1g-dev libjpeg-dev xvfb libav-tools xorg-dev python-opengl libboost-all-dev libsdl2-dev swig cd ~ git clone cd gym sudo pip3 install -e '.[all]'

Next, we can open Python3 in our terminal and import Gym.

python3 import gym

First, we need an environment. For our first example, we will load the very basic taxi environment.

env = gym.make("Taxi-v2")

To initialize the environment, we must reset it.


You will notice that resetting the environment will return an integer. This number will be our initial state. All possible states in this environment are represented by an integer ranging from 0 to 499. We can determine the total number of possible states using the following command:


If you would like to visualize the current state, type the following:


In this environment the yellow square represents the taxi, the (“|”) represents a wall, the blue letter represents the pick-up location, and the purple letter is the drop-off location. The taxi will turn green when it has a passenger aboard. While we see colors and shapes that represent the environment, the algorithm does not think like us and only understands a flattened state, in this case an integer.

So, now that we have our environment loaded and we know our current state, let's explore the actions available to the agent.


This shows us there are a total of six actions available. Gym will not always tell you what these actions mean, but in this case, the six possible actions are: down (0), up (1), right (2), left (3), pick-up (4), and drop-off (5).

For learning’s sake, let’s override the current state to 114.

env.env.s = 114 env.render()

And, let’s move up.

env.step(1) env.render()

You will notice that env.step(1) will return four variables.

(14, -1, False, {'prob': 1.0})

In the future we will define these variables as so:

state, reward, done, info = env.step(1)

These four variables are: the new state (St+1 = 14), reward (Rt+1 = -1), a boolean stating whether the environment is terminated or done, and extra info for debugging. Every Gym environment will return these same four variables after an action is taken, as they are the core variables of a reinforcement learning problem.

Take a look at the rendered environment. What do you expect the environment would return if you were to move left? It would, of course, give the exact same return as before. The environment always gives a -1 reward for each step in order for the agent to try and find the quickest solution possible. If you were measuring your total accumulated reward, constantly running into a wall would heavily penalize your final reward. The environment will also give a -10 reward every time you incorrectly pick up or drop off a passenger.

So, how can we solve the environment?

One surprising way you could solve this environment is to choose randomly among the six possible actions. The environment is considered solved when you successfully pick up a passenger and drop them off at their desired location. Upon doing this, you will receive a reward of 20 and done will equal True. The odds are small, but it’s still possible, and given enough random actions you will eventually luck out. A core part of evaluating any agent’s performance is to compare it to a completely random agent. In a Gym environment, you can choose a random action using env.action_space.sample(). You can create a loop that will do random actions until the environment is solved. We will put a counter in there to see how many steps it takes to solve the environment.

state = env.reset() counter = 0 reward = None while reward != 20:     state, reward, done, info = env.step(env.action_space.sample())     counter += 1 print(counter)

You may luck out and solve the environment fairly quickly, but on average, a completely random policy will solve this environment in about 2000+ steps, so in order to maximize our reward, we will have to have the algorithm remember its actions and their associated rewards. In this case, the algorithm’s memory is going to be a Q action value table.

To manage this Q table, we will use a NumPy array. The size of this table will be the number of states (500) by the number of possible actions (6).

import numpy as np Q = np.zeros([env.observation_space.n, env.action_space.n])

Over multiple episodes of trying to solve the problem, we will be updating our Q values, slowly improving our algorithm’s efficiency and performance. We will also want to track our total accumulated reward for each episode, which we will define as G.

G = 0

Similar to most machine learning problems, we will need a learning rate as well. I will use my personal favorite of 0.618, also known as the mathematical constant phi.

alpha = 0.618

Next, we can implement a very basic Q learning algorithm.

for episode in range(1,1001):     done = False     G, reward = 0,0     state = env.reset()     while done != True:             action = np.argmax(Q[state]) #1             state2, reward, done, info = env.step(action) #2             Q[state,action] += alpha * (reward + np.max(Q[state2]) - Q[state,action]) #3             G += reward             state = state2        if episode % 50 == 0:         print('Episode {} Total Reward: {}'.format(episode,G))

This code alone will solve the environment. There is a lot going on in this code, so I will try and break it down.

First (#1): The agent starts by choosing an action with the highest Q value for the current state using argmax. Argmax will return the index/action with the highest value for that state. Initially, our Q table will be all zeros. But, after every step, the Q values for state-action pairs will be updated.

Second (#2): The agent then takes action and we store the future state as state2 (St+1). This will allow the agent to compare the previous state to the new state.

Third (#3): We update the state-action pair (St , At) for Q using the reward, and the max Q value for state2 (St+1). This update is done using the action value formula (based upon the Bellman equation) and allows state-action pairs to be updated in a recursive fashion (based on future values). See Figure 2 for the value iteration update.

Figure 2. Value iteration update. Source: By Gregz448, CC0, on Wikimedia Commons.

Following this update, we update our total reward G and update state (St) to be the previous state2 (St+1) so the loop can begin again and the next action can be decided.

After so many episodes, the algorithm will converge and determine the optimal action for every state using the Q table, ensuring the highest possible reward. We now consider the environment problem solved.

Now that we solved a very simple environment, let’s move on to the more complicated Atari environment—Ms. Pacman.

env = gym.make("MsPacman-v0") state = env.reset()

You will notice that env.reset() returns a large array of numbers. To be specific, you can enter state.shape to show that our current state is represented by a 210x160x3 Tensor. This represents the height, length, and the three RGB color channels of the Atari game or, simply put, the pixels. As before, to visualize the environment you can enter:


Also, as before, we can determine our possible actions by:


This will show that we have nine possible actions: integers 0-8. It’s important to remember that an agent should have no idea what these actions mean; its job is to learn which actions will optimize reward. But, for our sake, we can:


This will show the nine possible actions the agent can chose from, represented as taking no action, and the eight possible positions of the joystick.

Using our previous strategy, let’s see how good a random agent can perform.

state = env.reset() reward, info, done = None, None, None while done != True:     state, reward, done, info = env.step(env.action_space.sample())     env.render()

This completely random policy will get a few hundred points, at best, and will never solve the first level.

Continuing on, we cannot use our basic Q table algorithm because there is a total of 33,600 pixels with three RGB values that can have a range from 0 to 255. It’s easy to see that things are getting extremely complicated; this is where deep learning comes to the rescue. Using techniques such as convolutional neural networks or a DQN, a machine learning library is able to take the complex high-dimensional array of pixels, make an abstract representation, and translate that representation into a optimal action.

In summary, you now have the basic knowledge to take Gym and start experimenting with other people's algorithms or maybe even create your own. If you would like a copy of the code used in this post to follow along with or edit, you can find the code on my GitHub.

The field of reinforcement learning is rapidly expanding with new and better methods for solving environments—at this time, the A3C method is one of the most popular. Reinforcement learning will more than likely play an important role in the future of AI and continues to produce very interesting results.

Continue reading Introduction to reinforcement learning and OpenAI Gym.

Categories: Technology

Smart home products need to fit into the experiences and rituals of our everyday lives

O'Reilly Radar - Thu, 2017/07/13 - 04:00

Learn how to "domesticate" smart products and understand why it’s essential to design relationships rather than just connectivity.

The house from which I wrote this report was built around 100 years ago. It was built by the French in the 1920s to be lent by the Shanghainese municipality to an official of a political party, and then it was given as a sign of respect to a famous opera singer, who decided to consign it to his mistress. It was sold—or more accurately, passed on—in the 1990s and subdivided into smaller apartments to accommodate up to eight families. Now, as a result of Shanghai’s housing boom, it’s rented out to just three families, at 20 times the price for which it was originally lent. It was built when electricity was a luxury. It was later wired for telephone, and then eventually TV cables were installed. I guess there should also be a satellite cable somewhere, but I cannot recognize which plugs are which anymore. When I moved in, fiber-optic was quickly set up by a cable contractor, thrown out in the courtyard to be bundled up in the mess of wires. My home is connected, and it always was in some way. It’s not really owned by anyone, and it’s a very complex mix of old, new, East, West, rich, poor, and so on.

When I look around my apartment, I see only a few things I would consider “smart”: my laptop, maybe a couple of other things I managed to coax to work together via Bluetooth, and my dog. The rooms have a complex mix of new things, old things, things I brought with me from previous homes, and things I found here in the neighborhood. There are Italian lamps, Chinese unbranded appliances, and various devices that were manufactured for the American market (but produced in China). There are a few handmade objects I acquired for the love of craft as well as a lot of cheaply mass-produced items I bought due to their low prices. There are things I use, things I forgot I had, things that were given to me, and things I bought by mistake (as I recently discovered Taobao Marketplace, which is Alibaba’s on-steroids answer to Amazon ecommerce platforms, the latter one boomed).

A home is not a “house”; a home is not only a set of problems that can be solved or tasks that can be automated. A home, as said by Joseph Grima, founder of the architecture and research studio Space Caviar, “is so much more than the sum of the functions it performs,” and it’s a very complex mix of people, architecture, history, memories, technology, and life. My home—or, more precisely, my apartment—answers as much to my functional needs as it functions as a representation of my own aspirations, or, most likely, my laziness. The more I realize looking around, the less I would have imagined that 2017 would look like this.

As a designer working in and with technology daily, I guess my home is the least “smart” that it can be, and it made me wonder, “Why?”

Why am I so excited to design for the near future in which smartness will leak into our daily lives, while at the same time not allowing it into my own space? Am I just living the symptoms of my own version of a recurring analog dream? Or, maybe I just don’t see the right kind of “smartness” that I want or need?

“Smart” assumptions

Smartness has been pushed as a term to represent the ongoing aspiration toward a more controlled and more “ecologically viable solution” of today’s environments and devices. Smart cities, smart homes, and smart devices are being pushed in our lives to help us deal with our own limits and point us in the direction of our personal and common aspiration of financial, ecological, and mental balance. However, when used in relation to home products, “smart” mostly represents the idea of an automated, quantified, efficient, optimized, and potentially also anthropomorphized use of technology.

Smart products today are not the highly complicated robots predicted by sci-fi and future visions of the past, nor are they the experimental computers imagined in the ubicomp visions of the ‘90s; rather, they are mainly mundane objects equipped with sensors, a little bit of processing power, and some sort of connectivity. In a similar scenario of the boom of the electrified everything of the late 1800s, we have now a plethora of smart + “thing”: from the more advanced smartphones, smart fridges, smart thermostats, smart plugs, and smart toilets all the way to smart bottles and socks.

The ability to embed computing power and connectivity inside almost any product at a viable price opens up completely new services, products, and use cases. As BERG CEO Matt Webb said, “The web getting inside physical things is the 21st century equivalent of electrification, which swept the world in the late 1800s.” Beyond connectivity, due to the growing number of tools that allows AI-like functionalities to be accessed by products “in the cloud,” a next level of smartness is becoming more and more accessible and setting a need for new paradigms of interaction and relationships with things that listen, adapt, evolve, learn and “dwell” with us. Although this shift will surely affect the experiences of users, it also will require designers, engineers, technologists, and companies to find ways to envision not only new values and use cases, but also to consider the implications of what they are bringing into people’s lives. As well put by Scott Smith, founder of the future consulting firm Changeist, “The rush to create new commercial prototypes, products, services, systems, and stacks often means culture, custom, needs, and desires are overstepped in the reach for profitable new use cases.”

Most new smart and intelligent products are promising to turn our homes into simple and better places to live, but maybe the biggest issue lies exactly in this assumption. There is a long and growing list of people who are laughing about the usefulness of smart products or who are concerned about the hidden and dark aspects of privacy and security; however, rather than attacking or defending the “smart that we do not want,” what I’m mostly interested in and what I will try to articulate in this report is how we can rethink what smart can mean next.

A fair amount of smart devices are now being designed and developed under the thunder and buzz of smart/AI, but to be successful beyond their goals or investment rounds, they will need to be welcomed and accepted into our homes, not necessarily in the same ways as friends or pets that require a lot of attention, but as something closer to new guests that we share space with, that might become useful at times, and hopefully don’t stink after three days.

Figure 1. Uninvited Guest by Superflux, shows a future of “annoying.” Smartness

To design such products, we will need to break apart some of the assumptions that lie behind the word “smart” and embrace the complex reality of real homes. We also will need to get our hands dirty and understand the basics of processes like learning because they will soon become the main subject and material with which we will design.

In “Learning from the Future” on page 5, we look at some of the technological dreams of the past and how they influenced the present state of the smart home and its products. We also look at some of the present products and experiments that are trying to break the status quo.

In “From Smart Products to Home Guests” on page 20, we break down smartness and some of its main assumptions into a set of new steps and materials that need to be “designed.” We explore new challenges we will face as designers who have to imagine more intelligent and connected products alongside which people will actually want to live.

If you share my views on the conceptual issues I outlined earlier and find yourself having to design something smart for a client or for your own business, I hope this report can help you find new inspiration and ways to think about what smartness can be. If you instead disagree and you love the smartness of today, I hope this report can lead you to see another side of what the future can be. Or, even better, if you’ve never read anything about smart products, I hope to push you in a new direction from the start.

Continue reading Smart home products need to fit into the experiences and rituals of our everyday lives.

Categories: Technology

Four short links: 13 July 2017

O'Reilly Radar - Thu, 2017/07/13 - 03:30

Conversational Data Science, L3 Autonomy, Human Computation, and Embedded Learning

  1. Iris: A Conversational Agent for Data Science -- a cross between R Notebook and Facebook Messenger. See also this description of the project and what they hope to achieve.
  2. Audi A8: First to Reach Level 3 Autonomy -- for those of you not up with your autonomous driving levels, the A8 features the “AI traffic jam pilot,” meaning the car can take control of the driving in slow-moving traffic at up to 60 kilometers per hour. The system is activated by a button on the center console, and it can take over acceleration, braking, steering, and starting from a dead-stop, all without the driver paying attention.
  3. The Complexity of Human Computation: A Concrete Model with Application to Passwords -- The intent of this paper is to apply the ideas and methods of theoretical computer science to better understand what humans can compute in their heads. For example, can a person compute a function in their head so that an eavesdropper with a powerful computer—who sees the responses to random inputs—still cannot infer responses to new inputs?
  4. ELL -- Microsoft's Embedded Learning Library, which allows you to build and deploy machine-learned pipelines onto embedded platforms, like Raspberry Pis, Arduinos, micro:bits, and other microcontrollers. The deployed machine learning model runs on the device, disconnected from the cloud. Our APIs can be used either from C++ or Python.

Continue reading Four short links: 13 July 2017.

Categories: Technology

Transforming text data in Java

O'Reilly Radar - Thu, 2017/07/13 - 03:00

Assign text snippets to a corresponding collection of vectors.

Data Operations

Now that we know how to input data into a useful data structure, we can operate on that data by using what we know about statistics and linear algebra. There are many operations we perform on data before we subject it to a learning algorithm. Often called preprocessing, this step comprises data cleaning, regularizing or scaling the data, reducing the data to a smaller size, encoding text values to numerical values, and splitting the data into parts for model training and testing. Often our data is already in one form or another (e.g., List or double[][]), and the learning routines we will use may take either or both of those formats. Additionally, a learning algorithm may need to know whether the labels are binary or multiclass or even encoded in some other way such as text. We need to account for this and prepare the data before it goes in the learning algorithm. The steps in this chapter can be part of an automated pipeline that takes raw data from the source and prepares it for either learning or prediction algorithms.

Many learning and prediction algorithms require numerical input. One of the simplest ways to achieve this is by creating a vector space model in which we define a vector of known length and then assign a collection of text snippets (or even words) to a corresponding collection of vectors. The general process of converting text to vectors has many options and variations. Here we will assume that there exists a large body of text (corpus) that can be divided into sentences or lines (documents) that can in turn be divided into words (tokens). Note that the definitions of corpus, document, and token are user-definable.

Continue reading Transforming text data in Java.

Categories: Technology


Subscribe to LuftHans aggregator - Technology