You are here


Security topic for Mar 15th

PLUG - Fri, 2018/02/16 - 00:00
Join the Phoenix Linux Users Group at their Security Session to learn about I2P and why it is not great. We will be discussing how I2P can increase your privacy, enhance your security, and reduce major corporations from being able to track you or use your browsing habits for their profits. We will also be discussing what I2P is not, what vulnerabilities exist, and how it can be used to harm you as well. Hope to see you there!

* Any discussion of paid services or software will be done for educational purposes only. Neither the instructor, PLUG, or any one involved in this course endorses any paid services or products discussed at this meeting.

Security Topic for Feb 15th

PLUG - Sun, 2018/02/11 - 21:29
Join the Phoenix Linux Users Group at their Security Session to learn about Freenet and why it is pretty cool. We will be discussing how Freenet can increase your privacy, enhance your security, and reduce major corporations from being able to track you or use your browsing habits for their profits. We will also be discussing what Freenet is not, what vulnerabilities exist, and how it can be used to harm you as well. Hope to see you there!

* Any discussion of paid services or software will be done for educational purposes only. Neither the instructor, PLUG, or any one involved in this course endorses any paid services or products discussed at this meeting.

Four short links: 9 February 2018

O'Reilly Radar - Fri, 2018/02/09 - 05:50

Small GUI, Dangerous URLs, Face-Recognition Glasses, and The Future is Hard

  1. Nuklear -- a single-header ANSI C GUI library, with a lot of bindings (Python, Golang, C#, etc.). (via Hacker News)
  2. unfurl -- a tool that analyzes large collections of URLs and estimates their entropies to sift out URLs that might be vulnerable to attack. (via this blog)
  3. Chinese Police Using Face Recognition Glasses -- In China, people must use identity documents for train travel. This rule works to prevent people with excessive debt from using high-speed trains, and limit the movement of religious minorities who have had identity documents confiscated and can wait years to get a valid passport. We asked for glasses that would help us remember people's names, we got Robocop 0.5a/BETA2FINAL. {Obligatory "Black Mirror" reference goes here} (via BoingBoing)
  4. Why I Barely Read SF These Days (Charlie Stross) -- SF should—in my view—be draining the ocean and trying to see at a glance which of the gasping, flopping creatures on the sea bed might be lungfish. But too much SF shrugs at the state of our seas and settles for draining the local aquarium, or even just the bathtub, instead. In pathological cases, it settles for gazing into the depths of a brightly coloured computer-generated fishtank screensaver. Earlier in the essay he talks about how the first to a field defines the tropes and borders that others play in, and it's remarkably hard to find authors who can and will break out of them. (via Matt Jones)

Continue reading Four short links: 9 February 2018.

Categories: Technology

Richard Warburton and Raoul-Gabriel Urma on Java 8 and Reactive Programming

O'Reilly Radar - Thu, 2018/02/08 - 05:05

The O’Reilly Programming Podcast: Building reactive applications.

In this episode of the O’Reilly Programming Podcast, I talk with Richard Warburton and Raoul-Gabriel Urma of Iteratr Learning. They are the presenters of a series of O’Reilly Learning Paths, including Getting Started with Reactive Programming and Build Reactive Applications in Java 8. Warburton is the author of Java 8 Lambdas, and Urma is the author of Java 8 in Action.

Continue reading Richard Warburton and Raoul-Gabriel Urma on Java 8 and Reactive Programming.

Categories: Technology

5 best practices when requesting visuals for your content

O'Reilly Radar - Thu, 2018/02/08 - 04:30

What a design request should look like when you're talking to an external entity.

Continue reading 5 best practices when requesting visuals for your content.

Categories: Technology

Four short links: 8 February 2018

O'Reilly Radar - Thu, 2018/02/08 - 04:20

Data for Problems, Quantum Algorithms, Network Transparency, and AI + Humans

  1. Solving Public Problems With Data -- an introduction to data science and data analytical thinking in the public interest. Online lecture series. Beth Noveck gives one of them. (via The Gov Lab)
  2. Quantum Algorithms: An Overview -- Here we briefly survey some known quantum algorithms, with an emphasis on a broad overview of their applications rather than their technical details. We include a discussion of recent developments and near-term applications of quantum algorithms. (via A Paper A Day)
  3. X11's Network Transparency is Largely a Failure -- Basic X clients that use X properties for everything may be genuinely network transparent, but there are very few of those left these days.
  4. How to Become a Centaur -- When you create a Human+AI team, the hard part isn’t the "AI". It isn’t even the “Human”. It’s the “+”. Interesting history and current state of human and AI systems. (via Tom Stafford)

Continue reading Four short links: 8 February 2018.

Categories: Technology

February's PLUG meeting 2/18

PLUG - Wed, 2018/02/07 - 09:29
der.hans: Regular Expressions-A guided tour

Example driven introduction to regular expressions. The talk uses plain English to explain regular expression concepts, syntax and language. Common tools such as grep, sed and awk will provide conduits for demonstrating regular expressions.

Many of our favorite system administration tools use regular expressions for text matching. While using plain English to describe regular expressions the example driven introduction will explain regular expression concepts, syntax and language. Examples will include common tools such as grep, sed and awk.

About der.hans:
der.hans is a Free Software community veteran, presenter and author. He is the founder of the Free Software Stammtisch, BoF organizer for the Southern California Linux Expo (SCaLE) and chairman of the Phoenix Linux User Group (PLUG).

As a technology and entrepreneurial veteran, roles have included director of engineering, engineering manager, IS manager, system administrator, community college instructor, developer and DBA.

He presents regularly at large community-led conferences (SCaLE, SeaGL, LibrePlanet, LFNW) and many local groups.

HVMN’s better-body biohacking

O'Reilly Radar - Wed, 2018/02/07 - 04:30

Learn how biohacking is unlocking human potential.

Technology is unique in the fact that its improvement provides an intuitive next step. Products are refined, updated, and necessarily upgraded at any given time. Optimization is never viewed as a bonus in the tech industry; it’s the name of the game.

But what happens when we attempt to expand optimization goals to include the very facilitators of progress: our minds? Crossing the boundary between hard science and pseudoscience, biohacking companies are exploring the principle of “upgrading” the human body in the hopes that our inherited genetics are more malleable than we think. One such company is HVMN (pronounced “human”). The company’s main product is NOOTROBOX, a line of nootropics or colloquially-dubbed “smart drugs” meant to enhance neural performance in areas such as memory, learning, and focus.

Continue reading HVMN’s better-body biohacking.

Categories: Technology

Delivering effective communication in software teams

O'Reilly Radar - Wed, 2018/02/07 - 04:00

Optimize for business value with clear feedback loops and quality standards.

We’ve had the privilege to work with many clients from different business sectors. Each client has granted us the opportunity to see how their teams perceive the value of software within their organizations. We’ve also witnessed how the same types of systems (e.g. ERPs) in competing organizations raise completely different problems and challenges. As a result of these experiences, we’ve come to understand that the key to building high-quality software architecture is effective communication between every team member involved in the project who expects to gain value from a software system.

So, if you’re a software architect or developer and you want to improve your architectures or codebases, you’ll have to address the organizational parts as well. Research conducted by Graziotin et al1. states that software development is dominated by these often-neglected organizational elements, and that the key to high-quality software and productive developers is the happiness and satisfaction of those developers. In turn, the key to happy and productive developers is empowerment - both on an organizational and technical level.

Continue reading Delivering effective communication in software teams.

Categories: Technology

Four short links: 7 February 2018

O'Reilly Radar - Wed, 2018/02/07 - 04:00

Identity Advice, Customer Feedback, Fun Toy, and Reproducibility Resources

  1. 12 Best Practices for User Account, Authorization, and Password Management (Google) -- Your users are not an email address. They're not a phone number. They're not the unique ID provided by an OAUTH response. Your users are the culmination of their unique, personalized data and experience within your service. A well-designed user management system has low coupling and high cohesion between different parts of a user's profile.
  2. Customer Satisfaction at the Push of a Button (New Yorker) -- simply getting binary good/bad feedback is better than no feedback, even if it's not as good as using NPS with something like Thematic. Also an interesting story about the value of physical interactions over purely digital.
  3. XXY Oscilloscope -- try this or this to get started. (via Hacker News)
  4. Reproducibility Workshop -- slides and handouts from a workshop to highlight some of the resources available to help share code, data, reagents, and methods. (via Lenny Teltelman)

Continue reading Four short links: 7 February 2018.

Categories: Technology

Re-thinking marketing: Generating attention you can turn into profitable demand

O'Reilly Radar - Wed, 2018/02/07 - 03:50

The media and ad tech sessions at the Strata Data Conference in San Jose will dig deep into how media businesses are changing.

First-year business students are taught that marketing consists of four Ps: product, place (or channel), price, and promotion. But this thinking is dated. In an era of information saturation, simply creating another piece of information in the form of a spec sheet, white paper, or press release compounds the problem.

I’ve been using a newer definition of marketing in recent years: generating attention you can turn into profitable demand. This underscores the “long funnel” of conversion from initial consumer awareness and engagement, to desirable outcomes like sales, word-of-mouth referral, and the retention of loyal customers.

At the start of the long funnel is media. Traditionally, media was a one-to-many model, in which a few organizations—armed with printing presses and broadcast studios—sent a single message out to the masses. They made money through purchases, subscriptions, and in many cases, advertising.

Much has changed. Today’s communication is bidirectional, flowing from the audience back to the publisher. It’s individualized, with each of us experiencing a tailored feed of information. The cost of publishing is vanishingly small, with anyone able to share a video with the world for practically nothing. And most importantly, we expect media to be free.

This expectation stems from two simple facts: there’s too much content out there, and users create most of it.

The abundance of content is a consequence of how easy it is to publish. Anyone can become an expert; we consume tailored news. I might read 10 publications’ technology sections, but ignore all sports news. Gone are the days of reading a single publication cover to cover. I choose podcasts to suit my interests, seldom exploring.

And the world of user-generated content has birthed a second kind of media. Facebook, Medium, Twitter, Reddit, and their ilk don’t employ writers, but we consume most of our words there. Traditional media outlets with paid reporting and editorial calendars are being squeezed out.

Jeff Jarvis has said that advertising is failure. It means you haven’t sold an issue, or a subscription. It’s a bad outcome. And yet, it’s the basis for most of what we consume today. Craigslist decimated newspapers partly because the classified ads were the only thing keeping them alive.

The nature of media has shifted, too. It’s gaming, and betting, and theme parks, and blogs, and Youtube channels, and streaming subscriptions. Omnichannel analytics means tracking a customer’s engagement with a brand or some content across many platforms and devices.

With a sprawl of media, and an increased reliance on advertising despite razor-thin margins, media creators of all stripes take analytics very seriously. Data is the difference between dominance and obsolescence, whether you’re keeping a player engaged, trying to get a subscriber to stick around, recommending the next best song, serving a tailored ad, or satisfying a die-hard sports fan.

So, we’re going to dig deep into the media and advertising technology industry at the Strata Data Conference in San Jose this March. With the help of David Boyle—one of the world’s great media analysts, whose career spans record labels, print publishers, broadcasters, and online learning—we’re assembling a lineup of experts and practitioners from every facet of media. We’ll hear case studies, never-before-shared insights, and projections. We’re even running an Oxford-style debate, where we’ll challenge the statement: “Machines have better taste than humans.” (Check out our lineup of talks.)

Modern business starts with attention. The risk is seldom “can you build it?” but rather, “will anyone care?” To understand how media businesses are changing—and how the journey from audience to customer begins—we hope you’ll join us next month.

Continue reading Re-thinking marketing: Generating attention you can turn into profitable demand.

Categories: Technology

Introducing capsule networks

O'Reilly Radar - Tue, 2018/02/06 - 05:00

How CapsNets can overcome some shortcomings of CNNs, including requiring less training data, preserving image details, and handling ambiguity.

Capsule networks (CapsNets) are a hot new neural net architecture that may well have a profound impact on deep learning, in particular for computer vision. Wait, isn't computer vision pretty much solved already? Haven't we all seen fabulous examples of convolutional neural networks (CNNs) reaching super-human level in various computer vision tasks, such as classification, localization, object detection, semantic segmentation or instance segmentation (see Figure 1)?

Figure 1. Some of the main computer vision tasks. Today, each of these tasks requires a very different CNN architecture, for example ResNet for classification, YOLO for object detection, Mask R-CNN for instance segmentation, and so on. Image by Aurélien Géron.

Well, yes, we’ve seen fabulous CNNs, but:

  • They were trained on huge numbers of images (or they reused parts of neural nets that had). CapsNets can generalize well using much less training data.
  • CNNs don’t handle ambiguity very well. CapsNets do, so they can perform well even on crowded scenes (although, they still struggle with backgrounds right now).
  • CNNs lose plenty of information in the pooling layers. These layers reduce the spatial resolution (see Figure 2), so their outputs are invariant to small changes in the inputs. This is a problem when detailed information must be preserved throughout the network, such as in semantic segmentation. Today, this issue is addressed by building complex architectures around CNNs to recover some of the lost information. With CapsNets, detailed pose information (such as precise object position, rotation, thickness, skew, size, and so on) is preserved throughout the network, rather than lost and later recovered. Small changes to the inputs result in small changes to the outputs—information is preserved. This is called "equivariance." As a result, CapsNets can use the same simple and consistent architecture across different vision tasks.
  • Finally, CNNs require extra components to automatically identify which object a part belongs to (e.g., this leg belongs to this sheep). CapsNets give you the hierarchy of parts for free.
Figure 2. The DeepLab2 pipeline for image segmentation, by Liang-Chieh Chen, et al.: notice that the output of the CNN (top right) is very coarse, making it necessary to add extra steps to recover some of the lost details. From the paper DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs, figure reproduced with the kind permission of the authors. See this great post by S. Chilamkurthy to see how diverse and complex the architectures for semantic segmentation can get.

CapsNets were first introduced in 2011 by Geoffrey Hinton, et al., in a paper called Transforming Autoencoders, but it was only a few months ago, in November 2017, that Sara Sabour, Nicholas Frosst, and Geoffrey Hinton published a paper called Dynamic Routing between Capsules, where they introduced a CapsNet architecture that reached state-of-the-art performance on MNIST (the famous data set of handwritten digit images), and got considerably better results than CNNs on MultiMNIST (a variant with overlapping pairs of different digits). See Figure 3.

Figure 3. MultiMNIST images (white) and their reconstructions by a CapsNet (red+green). “R” = reconstructions; “L” = labels. For example, the predictions for the first example (top left) are correct, and so are the reconstructions. But in the fifth example, the prediction is wrong: (5,7) instead of (5,0). Therefore, the 5 is correctly reconstructed, but not the 0. From the paper: Dynamic routing between capsules, figure reproduced with the kind permission of the authors.

Despite all their qualities, CapsNets are still far from perfect. Firstly, for now they don't perform as well as CNNs on larger images such as CIFAR10 or ImageNet. Moreover, they are computationally intensive, and they cannot detect two objects of the same type when they are too close to each other (this is called the "crowding problem," and it has been shown that humans have it, too). But the key ideas are extremely promising, and it seems likely that they just need a few tweaks to reach their full potential. After all, modern CNNs were invented in 1998, yet they only beat the state of the art on ImageNet in 2012, after a few tweaks.

So, what are CapsNets exactly?

In short, a CapsNet is composed of capsules rather than neurons. A capsule is a small group of neurons that learns to detect a particular object (e.g., a rectangle) within a given region of the image, and it outputs a vector (e.g., an 8-dimensional vector) whose length represents the estimated probability that the object is present[1], and whose orientation (e.g., in 8D space) encodes the object's pose parameters (e.g., precise position, rotation, etc.). If the object is changed slightly (e.g., shifted, rotated, resized, etc.) then the capsule will output a vector of the same length, but oriented slightly differently. Thus, capsules are equivariant.

Much like a regular neural network, a CapsNet is organized in multiple layers (see Figure 4). The capsules in the lowest layer are called primary capsules: each of them receives a small region of the image as input (called its receptive field), and it tries to detect the presence and pose of a particular pattern, for example a rectangle. Capsules in higher layers, called routing capsules, detect larger and more complex objects, such as boats.

Figure 4. A two-layer CapsNet. In this example, the primary capsule layer has two maps of 5x5 capsules, while the second capsule layer has two maps of 3x3 capsules. Each capsule outputs a vector. Each arrow represents the output of a different capsule. Blue arrows represent the output of a capsule that tries to detect triangles, black arrows represent the output of a capsule that tries to detect rectangles, and so on. Image by Aurélien Géron.

The primary capsule layer is implemented using a few regular convolutional layers. For example, in the paper, they use two convolutional layers that output 256 6x6 features maps containing scalars. They reshape this output to get 32 6x6 maps containing 8-dimensional vectors. Finally, they use a novel squashing function to ensure these vectors have a length between 0 and 1 (to represent a probability). And that's it: this gives the output of the primary capsules.

The capsules in the next layers also try to detect objects and their pose, but they work very differently, using an algorithm called routing by agreement. This is where most of the magic of CapsNets lies. Let's look at an example.

Suppose there are just two primary capsules: one rectangle capsule and one triangle capsule, and suppose they both detected what they were looking for. Both the rectangle and the triangle could be part of either a house or a boat (see Figure 5). Given the pose of the rectangle, which is slightly rotated to the right, the house and the boat would have to be slightly rotated to the right as well. Given the pose of the triangle, the house would have to be almost upside down, whereas the boat would be slightly rotated to the right. Note that both the shapes and the whole/part relationships are learned during training. Now notice that the rectangle and the triangle agree on the pose of the boat, while they strongly disagree on the pose of the house. So, it is very likely that the rectangle and triangle are part of the same boat, and there is no house.

Figure 5. Routing by agreement, step 1—predict the presence and pose of objects based on the presence and pose of object parts, then look for agreement between the predictions. Image by Aurélien Géron.

Since we are now confident that the rectangle and triangle are part of the boat, it makes sense to send the outputs of the rectangle and triangle capsules more to the boat capsule, and less to the house capsule: this way, the boat capsule will receive more useful input signal, and the house capsule will receive less noise. For each connection, the routing-by-agreement algorithm maintains a routing weight (see Figure 6): it increases routing weight when there is agreement, and decreases it in case of disagreement.

Figure 6. Routing by agreement, step 2—update the routing weights. Image by Aurélien Géron.

The routing-by-agreement algorithm involves a few iterations of agreement-detection + routing-update (note that this happens for each prediction, not just once, and not just at training time). This is especially useful in crowded scenes: for example, in Figure 7, the scene is ambiguous because you could see an upside-down house in the middle, but this would leave the bottom rectangle and top triangle unexplained, so the routing-by-agreement algorithm will most likely converge to a better explanation: a boat at the bottom, and a house at the top. The ambiguity is said to be "explained away": the lower rectangle is best explained by the presence of a boat, which also explains the lower triangle, and once these two parts are explained away, the remaining parts are easily explained as a house.

Figure 7. Routing by agreement can parse crowded scenes, such as this ambiguous image, which could be misinterpreted as an upside-down house plus some unexplained parts. Instead, the lower rectangle will be routed to the boat, and this will also pull the lower triangle into the boat as well. Once that boat is “explained away,” it’s easy to interpret the top part as a house. Image by Aurélien Géron.

And that’s it—you know the key ideas behind CapsNets! If you want more details, check out my two videos on CapsNets (one on the architecture and another on the implementation) and my commented TensorFlow implementation (Jupyter Notebook). Please don’t hesitate to comment on the videos, file issues on GitHub if you see any, or contact me on Twitter @aureliengeron. I hope you found this post useful!

[1] This is the original architecture proposed in the paper Dynamic routing with capsules, by S. Sabour, N. Frosst, and G. Hinton, but since then, they proposed a more general architecture where the object’s presence probability and pose parameters are encoded differently in the output vector. The ideas remain the same, however.

Continue reading Introducing capsule networks.

Categories: Technology

Four short links: 6 February 2018

O'Reilly Radar - Tue, 2018/02/06 - 04:50

Mine Research, Fight for Attention, AI Metaphors, and Research Browser Extensions

  1. metaDigitise -- Digitising functions in R for extracting data and summary statistics from figures in primary research papers.
  2. Center for Humane Technology -- Silicon Valley tech insiders fighting against attention-vacuuming tech design. (via New York Times)
  3. Tools, Substitutes, or Companions -- three metaphors for how we think about digital and robotic technologies. (via Tom Stafford)
  4. Unpaywall -- browser extension. Click the green tab and skip the paywall on millions of peer-reviewed journal articles. It's fast, free, and legal. Pair with the open access button. (via Swarthmore Libraries)

Continue reading Four short links: 6 February 2018.

Categories: Technology

Integrating continuous testing for improved open source security

O'Reilly Radar - Tue, 2018/02/06 - 04:00

Testing to prevent vulnerable open source libraries.

Integrating Testing to Prevent Vulnerable Libraries

Once you’ve found and fixed (or at least acknowledged) the security flaws in the libraries you use, it’s time to look into tackling this problem continuously.

There are two ways for additional vulnerabilities to show up in your dependencies:

Continue reading Integrating continuous testing for improved open source security.

Categories: Technology

Why I won't whitelist your site

O'Reilly Radar - Mon, 2018/02/05 - 05:00

Publishers need to take responsibility for code they run on my systems.

Many internet users—perhaps most—use an ad blocker. I’m one of them. All of us are familiar with the sites that won’t let us in without whitelisting them, or (only somewhat better) that repeatedly nag us to whitelist.

I’m not whitelisting anyone. I don’t have any fundamental problem with advertising; I wish ads weren’t as intrusive, and I believe advertisers would be better served by advertisements that had more respect for their viewers. But that’s not really why I use an ad blocker.

The real problem with ads is that they’re a vector for malware. It’s relatively easy to fold malware into otherwise-innocent advertisements, and that malware executes even if you don’t click on the ads. I’ve received malware from sites as otherwise legitimate as the BBC, and there are reports of malware from virtually every major online publisher—including sites like Forbes that won’t let you in if you don’t whitelist them. The New York Times, Reuters, MSN, and many others have all spread malware.

And no one takes responsibility for the advertisements or the damage they cause. The publishers just say “hey, we don’t control the ads; that’s the ad placement company.” The advertisers similarly say “hey, our ads come from a marketing firm, and they use some kind of web contractor to do the coding.” And the ad placement companies and marketing firms? All you get from them is the sound of silence.

Here’s the deal. I’m willing to whitelist any online publisher that will agree to a license in which they take responsibility for any code they run on my systems. Call it a EULA for using my browser on my computer. If you deliver malware, you will pay for the damages: my lost time, my lost data. If the idea catches on, managing all the contracts sounds like a problem, but I think it’s a business opportunity. Something would be needed to track all the licenses in an authoritative ledger. This sounds like an application for a blockchain. Maybe even a blockchain startup.

If I really need to read something on your site, and you won’t let me in because I am running an ad blocker, I might read your site anyway. That’s trivial—I have four or five browsers on all of my machines, and not all of them have ad blockers installed. But I won’t link to you, quote you, or tweet you. You’re dead to me.

I’ve been asked whether I have any proposals for a business model other than advertising. Not really. Though my employer, O’Reilly Media, does a bit of online publishing, and we don’t take advertising. But advising publishers on their business model isn’t my job—and they’ve yet to ask me for advice, anyway. My job is keeping my systems safe, and that requires keeping malware out.

Again, I have nothing against advertising as a business model. However, that model (and the businesses relying on it) deserve to fail if publishers won’t take responsibility for the ads they deliver. While I understand that publishers don’t control the ads, and don’t have the technical expertise to inspect the ads they deliver, they are the ones that deliver the ads. They bear the responsibility for damages.

Could this be a movement? Can we imagine a future with ad blockers that would let ads through if, and only if, the publisher has agreed to a license that allows users to recover damages from advertising-spread malware?

I’m in.

Continue reading Why I won't whitelist your site.

Categories: Technology

Four short links: 5 February 2018

O'Reilly Radar - Mon, 2018/02/05 - 04:35

Company Principles, DeepFake, AGI, and Missing Devices

  1. Principles of Technology Leadership (Bryan Cantrill) -- (slides) what cultural values and principles do you want to guide *your* company? (via Bryan Cantrill)
  2. Fun With DeepFakes; or How I Got My Wife on The Tonight Show -- this is going to further erode trust. How can you know what happened if all evidence can be convincingly faked? (via Simon Willison)
  3. MIT 6.S099: Artificial General Intelligence -- The lectures will introduce our current understanding of computational intelligence and ways in which strong AI could possibly be achieved, with insights from deep learning, reinforcement learning, computational neuroscience, robotics, cognitive modeling, psychology, and more. Additional topics will include AI safety and ethics. Worth noting that we can't build an artificial general intelligence right now, and may never be able to. Don't freak out because of the course headline.
  4. Catalog of Missing Devices (EFF) -- Things we’d pay money for—things you could earn money with—don’t exist thanks to the chilling effects of an obscure copyright law: Section 1201 of the Digital Millennium Copyright Act (DMCA 1201). From "third-party consumables for 3D printers" to an "ads-free YouTube for Kids," they're good ideas.

Continue reading Four short links: 5 February 2018.

Categories: Technology

Why product managers should master the art of user story writing

O'Reilly Radar - Fri, 2018/02/02 - 04:30

A well-written user story allows product managers to clearly communicate to their Agile development teams.

Continue reading Why product managers should master the art of user story writing.

Categories: Technology

Four short links: 2 February 2018

O'Reilly Radar - Fri, 2018/02/02 - 04:20

Digitize and Automate, Video Editor, AI + Humans, and Modest JavaScript

  1. Port Automation (Fortune) -- By digitizing and automating activities once handled by human crane operators and cargo haulers, seaports can reduce the amount of time ships sit in port and otherwise boost port productivity by up to 30%. "Digitize and automate" will be the mantra of the next decade.
  2. Shot Cut App -- a free, open source, cross-platform video editor.
  3. The Working Relationship Between Humans and AI (Mike Loukides) -- Whether we're talking about doctors, lawyers, engineers, Go players, or taxi drivers, we shouldn't expect AI systems to give us unchallengeable answers ex silico. We shouldn't be told that we need to "trust AI." What's important is the conversation.
  4. Stimulus-- modest JavaScript framework for the HTML you already have.

Continue reading Four short links: 2 February 2018.

Categories: Technology

Logo detection using Apache MXNet

O'Reilly Radar - Thu, 2018/02/01 - 13:25

Image recognition and machine learning for mar tech and ad tech.

Digital marketing is the marketing of products, services, and offerings on digital platforms. Advertising technology, commonly known as "ad tech," is the use of digital technologies by vendors, brands, and their agencies to target potential clients, deliver personalized messages and offerings, and analyze the impact of online spending: sponsored stories on Facebook newsfeeds; Instagram stories; ads that play on YouTube before the video content begins; the recommended links at the end of a CNN article, powered by Outbrain—these all are examples of ad tech at work.

In the past year, there has been a significant use of deep learning for digital marketing and ad tech.

In this article, we will delve into one part of a popular use case: mining the Web for celebrity endorsements. Along the way, we’ll see the relative value of deep learning architectures, run actual experiments, learn the effects of data sizes, and see how to augment the data when we don’t have enough.

Use case overview

In this article, we will see how to build a deep learning classifier that will predict the company, given an image with logo. This section provides an overview of where this model could be used.

Celebrities endorse a number of products. Quite often, they post pictures on social media showing off a brand they endorse. A typical post of that type contains an image, with the celebrity and some text they have written. The brand, in turn, is eager to learn about the appearance of such postings, and to show them to potential customers who might be influenced by them.

The ad tech application, therefore, works as follows: large numbers of postings are fed to a processor that figures out the celebrity, the brand, and the message. Then, for each potential customer, the machine learning model generates a very specific advertisement based on the time, location, message, brand, customers' preferred brands, and other things. Another model identifies the target customer base. And the targeted ad is now sent.

Figure 1 shows the workflow:

Figure 1. Celebrity brand-endorsement bot workflow. Image by Tuhin Sharma.

As you can see, the system is composed of a number of machine learning models.

Consider the image. The picture could have been taken in any setting. The first goal is to identify the objects and the celebrity in the picture. This is done by object detection models. Then, the next step is to identify the brand, if one appears. The easiest way to identify the brand is by its logo.

In this article, we will look into building a deep learning model to identify a brand by its logo in an image. Subsequent articles will talk about building some of the other pieces of the bot (object detection, text generation, etc.).

Problem definition

The problem addressed in this article is: given an image, predict the company (brand) in the image by identifying the logo.


To build machine learning models, access to high-quality data sets are imperative. In real-life, the data scientists will work with brand managers and agencies to get all possible logos.

For the purpose of this article, we will leverage the FlickrLogo data set. This data set has real-world images from Flickr, a popular photo sharing website. The FlickrLogo page has instructions on how to download the data. Please download the data if you want to use the code in this article to build your own models.


Identifying the brand from its logo is a classic computer vision problem. In the past few years, deep learning has become the state-of-the-art for computer vision problems. We will be building deep learning models for this use case


In our previous article, we talked about the strengths of Apache MXNet. We also talked about Gluon, the simpler interface on top of MXNet. Both are extremely powerful and allow deep learning engineers to experiment rapidly with various model architectures.

Let's now get to the code.


Let's first import the libraries we need for building the models:

import mxnet as mx import cv2 from pathlib import Path import os from time import time import shutil import matplotlib.pyplot as plt %matplotlib inline

Load the data

From the FlickrLogos data sets, let's use the FlickrLogos-32 data set. <flickrlogos-url> is the URL to this data set.

%%capture !wget -nc <flickrlogos-url> # Replace with the URL to the dataset !unzip -n ./

Data preparation

The next step is to create the following data sets:

  1. Train
  2. Validation
  3. Test

The FlickrLogos already has train, validation and test data sets, dividing the images as follows:

  • The train data set has 32 classes, each containing 10 images.
  • The validation data set has 3,960 images, of which 3,000 images have no logos.
  • The test data set has 3,960 images.

While the train images all have logos, the validation and test images have no logos. We want to build a model that generalizes well. We want a model that predicts correctly on images that weren't used for training (validation and test images).

To make our learning faster, with better accuracy, for the purpose of this article, we will move 50% of the no-logo class from the validation data set to the training set. So, we will make the training data set of size 1,820 (after adding 1,500 no-logo images from validation set) and reduce the validation data set size to 2,460 (after moving out 1,500 no-logo images). In a real-life setting, we will experiment with different model architectures to choose the one that performs well on the actual validation and test data sets.

Next, define the directory where the data is stored.

data_directory = "./FlickrLogos-v2/"

Now, define the path to the train, test, and validation data sets. For validation, we define two paths: one for the images containing logos and one for the rest of the images without logos.

train_logos_list_filename = data_directory+"trainset.relpaths.txt" val_logos_list_filename = data_directory+"valset-logosonly.relpaths.txt" val_nonlogos_list_filename = data_directory+"valset-nologos.relpaths.txt" test_list_filename = data_directory+"testset.relpaths.txt"

Let's now read the filenames for train, test, and validation (logo and non-logo) from the list just defined.

The list is given in the FlickrLogo data set, which has already categorized the images as train, test, validation with logo, and validation without logo.

# List of train images with open(train_logos_list_filename) as f: train_logos_filename = # List of validation images without logos with open(val_nonlogos_list_filename) as f: val_nonlogos_filename = # List of validation images with logos with open(val_logos_list_filename) as f: val_logos_filename = # List of test images with open(test_list_filename) as f: test_filenames =

Now, move some of the validation images without logos to the set of train images. This set will end up with all the train images and 50% of no-logo images from the validation data set. The validation set will end up with all the validation images that have logos and the remaining 50% of no-logo images.

train_filenames = train_logos_filename + val_nonlogos_filename[0:int(len(val_nonlogos_filename)/2)] val_filenames = val_logos_filename + val_nonlogos_filename[int(len(val_nonlogos_filename)/2):]

To verify what we’ve done, let's print the number of images in the train, test and validation data sets.

print("Number of Training Images : ",len(train_filenames)) print("Number of Validation Images : ",len(val_filenames)) print("Number of Testing Images : ",len(test_filenames))

The next step in the data preparation process is to set the folder paths in a way that makes model training easy.

We need the folder structure to be like Figure 2.

Figure 2. Folder structure for data. Image by Tuhin Sharma.

The following function helps us create this structure.

def prepare_datesets(base_directory,filenames,dest_folder_name): for filename in filenames: image_src_path = base_directory+filename image_dest_path = image_src_path.replace('classes/jpg',dest_folder_name) dest_directory_path = Path(os.path.dirname(image_dest_path)) dest_directory_path.mkdir(parents=True,exist_ok=True) shutil.copy2(image_src_path, image_dest_path)

Call this function to create the train, validation, and test folders with the images placed under them within their respective classes.

prepare_datesets(base_directory=data_directory,filenames=train_filenames,dest_folder_name='train_data') prepare_datesets(base_directory=data_directory,filenames=val_filenames,dest_folder_name='val_data') prepare_datesets(base_directory=data_directory,filenames=test_filenames,dest_folder_name='test_data')

The next step is to define the hyperparameters for the model.

We have 33 classes (32 logos and 1 non-logo). The data size isn't huge, so we will use only one GPU. We will train for 20 epochs and use 40 as the batch size for training.

batch_size = 40 num_classes = 33 num_epochs = 20 num_gpu = 1 ctx = [mx.gpu(i) for i in range(num_gpu)]

Data pre-processing

Once the images are loaded, we need to ensure the images are of the same size. We will resize all the images to be 224 * 224 pixels.

We have 1,820 training images, which is really not much data. Is there a smart way to get more data? An astounding yes. An image, when flipped, still means the same thing, at least for logos. A random crop of the logo is also still the same logo.

So, we do not need to add images for the purposes of our training, but instead can transform some of the existing images by flipping them and cropping them. This helps us get a more robust model.

Let's flip 50% of the training data set horizontally and crop them to 224 * 224 pixels.

train_augs = [ mx.image.HorizontalFlipAug(.5), mx.image.RandomCropAug((224,224)) ]

For the validation and test data sets, let's center crop to get each image to 224 224. All the images in the train, test, and validation data sets will now be of 224 224 size.

val_test_augs = [ mx.image.CenterCropAug((224,224)) ]

To perform the transforms we want on images, define the function transform. Given the data and the augmentation type, it performs the transformation on the data and returns the updated data set.

def transform(data, label, augs): data = data.astype('float32') for aug in augs: data = aug(data) # from (H x W x c) to (c x H x W) data = mx.nd.transpose(data, (2,0,1)) return data, mx.nd.array([label]).asscalar().astype('float32')

Gluon has an utility function to load image files: It requires the data to be available in the folder structure illustrated in Figure 2.

The function takes in the following parameters:

  • Path to the root directory where the images are stored
  • A flag to instruct if images have to be converted to greyscale or color (color is the default option)
  • A function that takes the data (image) and its label and transforms them

The following code shows how to transform the image when loading:

train_imgs = data_directory+'train_data', transform=lambda X, y: transform(X, y, train_augs))

Similarly, the transformations are applied to the validation and test data sets and are loaded.

val_imgs = data_directory+'val_data', transform=lambda X, y: transform(X, y, val_test_augs)) test_imgs = data_directory+'test_data', transform=lambda X, y: transform(X, y, val_test_augs))

DataLoader is the built-in utility function to load data from the data set, and it returns mini-batches of data. In the above steps, we have the train, validation, and test data sets defined ( train_imgs, val_imgs, test_imgs respectively). The num_workers attribute lets us define the number of multi-processing workers to use for data pre-processing.

train_data =, batch_size,num_workers=1, shuffle=True) val_data =, batch_size, num_workers=1) test_data =, batch_size, num_workers=1)

Now that the images are loaded, let's take a look at them. Let's write a utility function called show_images that displays the images as a grid:

def show_images(imgs, nrows, ncols, figsize=None): """plot a grid of images""" figsize = (ncols, nrows) _, figs = plt.subplots(nrows, ncols, figsize=figsize) for i in range(nrows): for j in range(ncols): figs[i][j].imshow(imgs[i*ncols+j].asnumpy()) figs[i][j].axes.get_xaxis().set_visible(False) figs[i][j].axes.get_yaxis().set_visible(False)

Now, display the first 32 images in a 8 * 4 grid:

for X, _ in train_data: # from (B x c x H x W) to (Bx H x W x c) X = X.transpose((0,2,3,1)).clip(0,255)/255 show_images(X, 4, 8) break Figure 3. Grid of images after transformations are performed. Image by Tuhin Sharma.

Results are shown in Figure 3. Some of the images seem to contain logos, often truncated.

Utility functions for training

In this section, we will define utility functions to do the following:

  • Get the data for the batch being currently processed
  • Evaluate the accuracy of the model
  • Train the model
  • Get the image, given a URL
  • Predict the image's label, given the image

The first function, _get_batch, returns the data and label, given the batch.

def _get_batch(batch, ctx): """return data and label on ctx""" data, label = batch return (mx.gluon.utils.split_and_load(data, ctx), mx.gluon.utils.split_and_load(label, ctx), data.shape[0])

The function evaluate_accuracy returns the classification accuracy of the model. We have chosen a simple accuracy metric for the purpose of this article. In practice, the accuracy metric is chosen based on the application need.

def evaluate_accuracy(data_iterator, net, ctx): acc = mx.nd.array([0]) n = 0. for batch in data_iterator: data, label, batch_size = _get_batch(batch, ctx) for X, y in zip(data, label): acc += mx.nd.sum(net(X).argmax(axis=1)==y).copyto(mx.cpu()) n += y.size acc.wait_to_read() return acc.asscalar() / n

The next function we will define is the train function. This is by far the biggest function we will create in this article.

Given an existing model, the train, test, and validation data sets, the model is trained for the number of epochs specified. Our previous article contained a more detailed overview of how this function works.

Whenever the best accuracy on the validation data set is found, the model is checkpointed. For each epoch, the train, validation, and test accuracies are printed.

def train(net, ctx, train_data, val_data, test_data, batch_size, num_epochs, model_prefix, hybridize=False, learning_rate=0.01, wd=0.001): net.collect_params().reset_ctx(ctx) if hybridize == True: net.hybridize() loss = mx.gluon.loss.SoftmaxCrossEntropyLoss() trainer = mx.gluon.Trainer(net.collect_params(), 'sgd', { 'learning_rate': learning_rate, 'wd': wd}) best_epoch = -1 best_acc = 0.0 if isinstance(ctx, mx.Context): ctx = [ctx] for epoch in range(num_epochs): train_loss, train_acc, n = 0.0, 0.0, 0.0 start = time() for i, batch in enumerate(train_data): data, label, batch_size = _get_batch(batch, ctx) losses = [] with mx.autograd.record(): outputs = [net(X) for X in data] losses = [loss(yhat, y) for yhat, y in zip(outputs, label)] for l in losses: l.backward() train_loss += sum([l.sum().asscalar() for l in losses]) trainer.step(batch_size) n += batch_size train_acc = evaluate_accuracy(train_data, net, ctx) val_acc = evaluate_accuracy(val_data, net, ctx) test_acc = evaluate_accuracy(test_data, net, ctx) print("Epoch %d. Loss: %.3f, Train acc %.2f, Val acc %.2f, Test acc %.2f, Time %.1f sec" % ( epoch, train_loss/n, train_acc, val_acc, test_acc, time() - start )) if val_acc > best_acc: best_acc = val_acc if best_epoch!=-1: print('Deleting previous checkpoint...') os.remove(model_prefix+'-%d.params'%(best_epoch)) best_epoch = epoch print('Best validation accuracy found. Checkpointing...') net.collect_params().save(model_prefix+'-%d.params'%(epoch))

The function get_image returns the image from a given URL. This is used for testing the model's accuracy

def get_image(url, show=False): # download and show the image fname = img = cv2.cvtColor(cv2.imread(fname), cv2.COLOR_BGR2RGB) img = cv2.resize(img, (224, 224)) plt.imshow(img) return fname

The final utility function we will define is classify_logo. Given the image and the model, the function returns the class of the image (in this case, the brand name) and its associated probability.

def classify_logo(net, url): fname = get_image(url) with open(fname, 'rb') as f: img = mx.image.imdecode( data, _ = transform(img, -1, val_test_augs) data = data.expand_dims(axis=0) out = net(data.as_in_context(ctx[0])) out = mx.nd.SoftmaxActivation(out) pred = int(mx.nd.argmax(out, axis=1).asscalar()) prob = out[0][pred].asscalar() label = train_imgs.synsets return 'With prob=%f, %s'%(prob, label[pred])


Understanding the model architecture is quite important. In our previous article, we built a multi-layer perceptron (MLP). The architecture is shown in Figure 4.

Figure 4. Multi-layer perceptron. Image by Tuhin Sharma.

How would the input layer for an MLP model be? Our data is 224 * 224 pixels in size.

The most common way to create the input layer from that is to flatten it and create an input layer with 50,176 (224 * 224) neurons, ending up with a simple bit stream as shown in Figure 5.

Figure 5. Flattened input. Image by Tuhin Sharma.

But image data has a lot of spatial information that is lost when such flattening is done. And the other challenge is the number of weights. If the first hidden layer has 30 hidden neurons, the number of parameters in the model will be 50,176 * 30 + 30 bias units. So, this doesn't seem to be the right modeling approach for images.

Let's now discuss the more appropriate architecture: a convolutional neural network (CNN) for image classification.

Convolutional neural network (CNN)

CNNs are similar to MLPs, in the sense that they are also made up of neurons whose weights we learn. The key difference is that the inputs are images, and the archicture allows us to exploit the properties of the images into the architecture.

CNNs have convolutional layers. The term "convolution" is taken from image processing, and it is described by Figure 6. This works on a small window, called a "receptive field," instead of all the inputs from the previous layer. This allows the model to learn localized features.

Each layer moves a small matrix, called a kernel, over the part of the image fed to that layer. It adjusts each pixel to reflect the pixels around it, an operation that helps identify edges. Figure 6 shows an image on the left, a 3x3 kernel in the middle, and the results of applying the kernel to the top-left pixel on the right. We can also define multiple kernels, representing different feature maps.

Figure 6. Convolutional layer. Image by Tuhin Sharma.

In the example in Figure 6, the input image was 5x5 and the kernel was 3x3. The computation was an element-wise multplication between the two matrices. The output was 5x5.

To understand this, we need to understand two parameters at the convolution layer: stride and padding.

Stride controls how the kernel (filter) moves along the image.

Figure 7 illustrates the movement of the kernel from the first pixel to the second.

Figure 7. Kernel movement. Image by Tuhin Sharma.

In the Figure 7, the stride is 1.

When a 5x5 image is convolved with a 3x3 kernel, we will be getting a 3x3 image. Consider the case where we add a zero padding around the image. The 5x5 image is now surrounded with 0. This is illustrated in Figure 8.

Figure 8. Zero padding. Image by Tuhin Sharma.

This, when multipled by a 3x3 kernel, will result in a 5x5 output.

So, for the computation shown in Figure 6, it had a stride of 1 and padding of size 1.

CNN works with drastically fewer weights than the corresponding MLP. Say we use 30 kernels, each with 3x3 elements. Each kernel has 3x3 = 9 + 1 (for bias) parameters. This leads to 10 weights per kernel, 300 for 30 kernels. Contraste this against the 150,000 weights for the MLP in the previous section.

The next layer is typically a sub-sampling layer. Once we have identified the features, this sub-sampling layer simplifies the information. A common method is max pooling, which outputs the greatest value from each localized region of the output from the convolutional layer (see Figure 9). It reduces the output size, while preserving the maximum activation in every localized region.

Figure 9. Max pooling. Image by Tuhin Sharma.

You can see that it reduces the output size, while preserving the maximum activation in every localized region.

A good resource for more information on CNNs is the online book, Neural Networks and Deep Learning. Another good resource is Stanford University's CNN course

Now that we have learned the basics of what CNN is, let’s go and implement it for our problem using gluon.

The first step is to define the architecture:

cnn_net = mx.gluon.nn.Sequential() with cnn_net.name_scope(): # First convolutional layer cnn_net.add(mx.gluon.nn.Conv2D(channels=96, kernel_size=11, strides=(4,4), activation='relu')) cnn_net.add(mx.gluon.nn.MaxPool2D(pool_size=3, strides=2)) # Second convolutional layer cnn_net.add(mx.gluon.nn.Conv2D(channels=192, kernel_size=5, activation='relu')) cnn_net.add(mx.gluon.nn.MaxPool2D(pool_size=3, strides=(2,2))) # Flatten and apply fullly connected layers cnn_net.add(mx.gluon.nn.Flatten()) cnn_net.add(mx.gluon.nn.Dense(4096, activation="relu")) cnn_net.add(mx.gluon.nn.Dense(num_classes))

Now that the model architecture is defined, let's initialize the weights of the network. We will use the Xavier initalizer.

cnn_net.collect_params().initialize(mx.init.Xavier(magnitude=2.24), ctx=ctx)

Once the weights are initialized, we can train the model. We will call the same train function defined earlier and pass the required parameters for the function.

train(cnn_net, ctx, train_data, val_data, test_data, batch_size, num_epochs,model_prefix='cnn')

Epoch 0. Loss: 53.771, Train acc 0.77, Val acc 0.58, Test acc 0.72, Time 224.9 sec
Best validation accuracy found. Checkpointing...
Epoch 1. Loss: 3.417, Train acc 0.80, Val acc 0.60, Test acc 0.73, Time 222.7 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 2. Loss: 3.333, Train acc 0.81, Val acc 0.60, Test acc 0.74, Time 222.5 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 3. Loss: 3.227, Train acc 0.82, Val acc 0.61, Test acc 0.75, Time 222.4 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 4. Loss: 3.079, Train acc 0.82, Val acc 0.61, Test acc 0.75, Time 222.0 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 5. Loss: 2.850, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 222.7 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 6. Loss: 2.488, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 222.1 sec
Epoch 7. Loss: 1.943, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.3 sec
Epoch 8. Loss: 1.395, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 223.6 sec
Epoch 9. Loss: 1.146, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 222.5 sec
Epoch 10. Loss: 1.089, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.5 sec
Epoch 11. Loss: 1.078, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 220.7 sec
Epoch 12. Loss: 1.078, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.1 sec
Epoch 13. Loss: 1.075, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.3 sec
Epoch 14. Loss: 1.076, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.3 sec
Epoch 15. Loss: 1.076, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 220.4 sec
Epoch 16. Loss: 1.075, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.3 sec
Epoch 17. Loss: 1.074, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.8 sec
Epoch 18. Loss: 1.074, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 221.8 sec
Epoch 19. Loss: 1.073, Train acc 0.82, Val acc 0.61, Test acc 0.76, Time 220.9 sec

We asked the model to run for 20 epochs. Typically, we train for many epochs and pick the model at the epoch where the validation accuracy is the highest. Here, after 20 epochs, we can see from the log just shown that the model's best validation accuracy was in epoch 5. After that, the model doesn't seem to have learned much. Probably, the network was saturated and learning took place very slowly. We’ll try out a better approach in the next section, but first we’ll see how our current model performs.

Collect the parameters of the epoch that had the best validation accuracy and assign it as our model parameters:


Let’s now check how the model performs on new data. We’ll get an easy-to-recognize images from the Web (Figure 10) and see the model's accuracy.

img_url = "" classify_logo(cnn_net, img_url)

'With prob=0.081522, no-logo'

Figure 10. BMW logo. Image by Tuhin Sharma.

The model’s prediction has been terrible. It predicts the image to have no logo with probability of 8%. The prediction is wrong and the probability is quite weak.

Let’s try one more test image (see Figure 11) to see whether accuracy is any better.

img_url = "" classify_logo(cnn_net, img_url)

'With prob=0.075301, no-logo'

Figure 11. Foster’s logo. Image by Tuhin Sharma.

Yet again, the model’s prediction is wrong and the probability is quite weak.

We don't have much data, and the model training has saturated, as just seen. We can experiment with more model architectures, but we won’t overcome the problems of small data sets and trainable parameters much greater than the number of training images. So, how do we get around this problem? Can't deep learning be used if there isn't much data?

The answer to that is transfer learning, discussed next.

Transfer learning

Consider this analogy: you want to pick up a new foreign language. How does the learning happen?

You would take a conversation, say, for example: Instructor: How are you doing? You: I am good. How about you?

And you will try to learn the equivalent of this in the new language.

Because of your proficiency in English, you don't start learning a new language from scratch (even if it seems that you do). You already have the mental map of a language, and you try to find the corresponding words in the new language. Therefore, in the new language, while your vocabulary might still be limited, you will still be able to converse because of your knowledge of the structure of conversations in English.

Transfer learning works the same way. Highly accurate models are built on data sets where a lot of data is available. A common data set that you will come across is the ImageNet data. It has more than a million images. Researchers from around the world have built many different state-of-art models using this data. The resulting model, comprised of model architecture and weights, is freely available on the internet.

And starting from that pre-trained model, we will train the model for our problem. In fact, this is quite the norm. Almost invariably, the first model one would build for a computer vision problem would employ a pre-trained model.

In many cases, like our example, this might be all one can do—if restricted for data.

The typical practice is to keep many of the early layers fixed, and train only the last layers. If data is quite limited, only the classifier layer is re-trained. If data is moderately abundant, the last few layers are re-trained.

This works because a convolutional neural network learns higher level representation at each successive layer; the learning it has done at many of the early layers is held in common by all image classification problems.

Let's now use a pre-trained model for logo detection.

MXNet has a model zoo with a number of pre-trained models.

We will use a popular pre-trained model called resnet. The paper provides a lot of details on the model structure. A simpler explanation can be found in this article.

Let's first download the pre-trained model:

from mxnet.gluon.model_zoo import vision as models pretrained_net = models.resnet18_v2(pretrained=True)

Since our data set is small, we will re-train only the output layer. We randomly initialize the weights for the output layer:

finetune_net = models.resnet18_v2(classes=num_classes) finetune_net.features = pretrained_net.features finetune_net.output.initialize(mx.init.Xavier(magnitude=2.24))

We now call the same train function as before:

train(finetune_net, ctx, train_data, val_data, test_data, batch_size, num_epochs,model_prefix='ft',hybridize = True)

Epoch 0. Loss: 1.107, Train acc 0.83, Val acc 0.62, Test acc 0.76, Time 246.1 sec
Best validation accuracy found. Checkpointing...
Epoch 1. Loss: 0.811, Train acc 0.85, Val acc 0.62, Test acc 0.77, Time 243.7 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 2. Loss: 0.722, Train acc 0.86, Val acc 0.64, Test acc 0.78, Time 245.3 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 3. Loss: 0.660, Train acc 0.87, Val acc 0.66, Test acc 0.79, Time 243.4 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 4. Loss: 0.541, Train acc 0.88, Val acc 0.67, Test acc 0.80, Time 244.5 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 5. Loss: 0.528, Train acc 0.89, Val acc 0.68, Test acc 0.80, Time 243.4 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 6. Loss: 0.490, Train acc 0.90, Val acc 0.68, Test acc 0.81, Time 243.2 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 7. Loss: 0.453, Train acc 0.91, Val acc 0.71, Test acc 0.82, Time 243.6 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 8. Loss: 0.435, Train acc 0.92, Val acc 0.70, Test acc 0.82, Time 245.6 sec
Epoch 9. Loss: 0.413, Train acc 0.92, Val acc 0.72, Test acc 0.82, Time 247.7 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 10. Loss: 0.392, Train acc 0.92, Val acc 0.72, Test acc 0.83, Time 245.3 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 11. Loss: 0.377, Train acc 0.92, Val acc 0.72, Test acc 0.83, Time 244.5 sec
Epoch 12. Loss: 0.335, Train acc 0.93, Val acc 0.72, Test acc 0.84, Time 244.2 sec
Epoch 13. Loss: 0.321, Train acc 0.94, Val acc 0.73, Test acc 0.84, Time 245.0 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 14. Loss: 0.305, Train acc 0.93, Val acc 0.73, Test acc 0.84, Time 243.4 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 15. Loss: 0.298, Train acc 0.93, Val acc 0.73, Test acc 0.84, Time 243.9 sec
Epoch 16. Loss: 0.296, Train acc 0.94, Val acc 0.75, Test acc 0.84, Time 247.0 sec
Deleting previous checkpoint...
Best validation accuracy found. Checkpointing...
Epoch 17. Loss: 0.274, Train acc 0.94, Val acc 0.74, Test acc 0.84, Time 245.1 sec
Epoch 18. Loss: 0.292, Train acc 0.94, Val acc 0.74, Test acc 0.84, Time 243.9 sec
Epoch 19. Loss: 0.306, Train acc 0.95, Val acc 0.73, Test acc 0.84, Time 244.8 sec

The model starts right away with a higher accuracy. Typically, when data is less, we train only for a few epochs and pick the model at the epoch where the validation accuracy is the highest.

Here, epoch 16 has the best validation accuracy. Since the training data is limited, and the model kept on training, it has started to overfit. We can see that after epoch 16, while training accuracy is increasing, validation accuracy has begun to decrease.

Let's collect the parameters from the corresponding checkpoint of the 16th epoch and use it as the final model.

# The model's parameters are now set to the values at the 16th epoch finetune_net.collect_params().load('ft-%d.params'%(16),ctx)

Evaluating the predictions

For the same images that we used earlier to evaluate the predictions, let's see the prediction of the new model.

img_url = "" classify_logo(finetune_net, img_url)

'With prob=0.983476, bmw'

Figure 12. Image by Tuhin Sharma.

We can see that the model is able to predict BMW with 98% probability.

Let's now try the other image we tested earlier.

img_url = "" classify_logo(finetune_net, img_url)

'With prob=0.498218, fosters'

While the prediction probability isn't good, a tad lower than 50%, Foster's still gets the highest probability amongst all the logos.

Improving the model

To improve the model, we need to fix the way we constructed the training data set. Each individual logo had 10 training points. But as part of distributing the no-logo images from validation to training, we moved 1,500 images to training as no logo. This introduces a significant data set bias. This is not a good practice. The following are some options to fix this:

  • Weight the cross-entroy loss.
  • Don't include the no-logo images in the training data set. Build a model that predicts low class probabilities to all logos if it doesn't exist in test/validation images.

But remember, that even with transfer learning and data augmentation, we only have 320 images, and this is quite low to build highly accurate deep learning models.


In this article, we learned how to build image recognition models using MXNet. Gluon is ideal for rapid prototyping. Moving from prototyping to production is also quite easy with hybridization and symbol export. With a host of pre-trained models available on MXNet, we were able to get very good models for logo detection in pretty quick time. A very good resource for learning more about the underying theory is the Stanford's CS231n course.

Continue reading Logo detection using Apache MXNet.

Categories: Technology

Machine learning needs machine teaching

O'Reilly Radar - Thu, 2018/02/01 - 05:10

The O’Reilly Data Show Podcast: Mark Hammond on applications of reinforcement learning to manufacturing and industrial automation.

In this episode of the Data Show, I spoke with Mark Hammond, founder and CEO of Bonsai, a startup at the forefront of developing AI systems in industrial settings. While many articles have been written about developments in computer vision, speech recognition, and autonomous vehicles, I’m particularly excited about near-term applications of AI to manufacturing, robotics, and industrial automation. In a recent post, I outlined practical applications of reinforcement learning (RL)—a type of machine learning now being used in AI systems. In particular, I described how companies like Bonsai are applying RL to manufacturing and industrial automation. As researchers explore new approaches for solving RL problems, I expect many of the first applications to be in industrial automation.

Continue reading Machine learning needs machine teaching.

Categories: Technology


Subscribe to LuftHans aggregator - Technology