You are here


Data liquidity in the age of inference

O'Reilly Radar - Fri, 2017/09/22 - 03:00

Probabilistic computation holds too much promise for it to be stifled by playing zero sum games with data.

It's a special time in the evolutionary history of computing. Oft-used terms like big data, machine learning, and artificial intelligence have become popular descriptors of a broader underlying shift in information processing. While traditional rules-based computing isn’t going anywhere, a new computing paradigm is emerging around probabilistic inference, where digital reasoning is learned from sample data rather than hardcoded with boolean logic. This shift is so significant that a new computing stack is forming around it with emphasis on data engineering, algorithm development, and even novel hardware designs optimized for parallel computing workloads, both within data centers and at endpoints.

A funny thing about probabilistic inference is that when models work well, they’re probably right most of the time, but always wrong at least some of the time. From a mathematics perspective, this is because such models take a numerical approach to problem analysis, as opposed to an analytical one. That is, they learn patterns from data (with various levels of human involvement) that have certain levels of statistical significance, but remain somewhat ignorant to any physics-level intuition related to those patterns, whether represented by math theorems, conjectures, or otherwise. However, that’s also precisely why probabilistic inference is so incredibly powerful. Many real-world systems are so multivariate, complex, and even stochastic that analytical math models do not exist and remain tremendously difficult to develop. In the meanwhile, their physics-ignorant, FLOPS-happy, and often brutish machine learning counterparts can develop deductive capabilities that don’t nicely follow any known rules, yet still almost always arrive at the correct answers.

Continue reading Data liquidity in the age of inference.

Categories: Technology

Four short links: 22 September 2017

O'Reilly Radar - Fri, 2017/09/22 - 01:00

Molecular Robots, Distributed Deep Nets, SQL Notebook, and Super-Accurate GPS

  1. Scientists Create World’s First ‘Molecular Robot’ Capable Of Building Molecules -- Each individual robot is capable of manipulating a single molecule and is made up of just 150 carbon, hydrogen, oxygen and nitrogen atoms. To put that size into context, a billion billion of these robots piled on top of each other would still only be the same size as a single grain of salt. The robots operate by carrying out chemical reactions in special solutions which can then be controlled and programmed by scientists to perform the basic tasks. (via Slashdot)
  2. Distributed Deep Neural Networks -- in Adrian Colyer's words: DDNNs partition networks between mobile/embedded devices, cloud (and edge), although the partitioning is static. What’s new and very interesting here though is the ability to aggregate inputs from multiple devices (e.g., with local sensors) in a single model, and the ability to short-circuit classification at lower levels in the model (closer to the end devices) if confidence in the classification has already passed a certain threshold. It looks like both teams worked independently and in parallel on their solutions. Overall, DDNNs are shown to give lower latency decisions with higher accuracy than either cloud or devices working in isolation, as well as fault tolerance in the sense that classification accuracy remains high even if individual devices fail. (via Morning Paper)
  3. Franchise -- an open-source notebook for sql.
  4. Super-Accurate GPS Chips Coming to Smartphones in 2018 (IEEE Spectrum) -- 30cm accuracy (today: 5m), will help with the reflections you get in cities, and with 50% energy savings.

Continue reading Four short links: 22 September 2017.

Categories: Technology

How to start application tracing

O'Reilly Radar - Thu, 2017/09/21 - 03:00

A hands-on demonstration for implementing tracing in modern applications that introduces tracing through the CNCF’s OpenTracing project.

Continue reading How to start application tracing.

Categories: Technology

Jim Blandy and Jason Orendorff on Rust

O'Reilly Radar - Thu, 2017/09/21 - 03:00

The O’Reilly Programming Podcast: A look at a new systems programming language.

In this episode of the O’Reilly Programming Podcast, I talk with Jim Blandy and Jason Orendorff, both of Mozilla, where Blandy works on Firefox’s web developer tools and Orendorff is the module owner of Firefox’s JavaScript engine. They are the authors of the new O’Reilly book Progamming Rust.

Continue reading Jim Blandy and Jason Orendorff on Rust.

Categories: Technology

Four short links: 21 September 2017

O'Reilly Radar - Thu, 2017/09/21 - 01:00

Synthetic Muscles, Smarter SSH, Kickstarter Post-Mortem, and Computational Drawing

  1. Additive Synthetic Muscles -- electrically-actuated high stress, high strain, low density, 3D-printable muscles.
  2. teleport -- modern SSH that groks bastion hosts, certificates, and more.
  3. Anatomy of a Kickstarter -- It is possible to outsource much of the Kickstarter process, including copywriting, fulfilment, customer support and marketing. I treated the whole process as a learning experience and set aside 50% of my time for three months to appreciate its nuances from start to finish, with a hard-stop due to other commitments. Post-Kickstarter I committed another three months over the following year to deliver experiences such as the expedition to Afghanistan and stretch goals. BackerKit was the obvious candidate to outsource operations to, but was rejected for violating the no-asshole rule: they were tone-deaf, evasive on responding to cost estimates, and nagging in a way that only organisations that live and die by CRM systems can be.
  4. rune.js -- a JavaScript library for programming graphic design systems with SVG in both the browser or node.js.

Continue reading Four short links: 21 September 2017.

Categories: Technology

Why democratizing AI matters: Computing, data, algorithms, and talent

O'Reilly Radar - Wed, 2017/09/20 - 13:00

Jia Li explains why a democratized approach to AI ensures that the components behind these technologies reach the widest possible audience.

Continue reading Why democratizing AI matters: Computing, data, algorithms, and talent.

Categories: Technology

Fireside chat with Naveen Rao and Steve Jurvetson

O'Reilly Radar - Wed, 2017/09/20 - 13:00

A discussion on the impact and opportunities of artificial intelligence.

Continue reading Fireside chat with Naveen Rao and Steve Jurvetson.

Categories: Technology

Our Skynet moment

O'Reilly Radar - Wed, 2017/09/20 - 13:00

Tim O'Reilly says the algorithms that shape our economy must be rewritten if we want to create a more human-centered future.

Continue reading Our Skynet moment.

Categories: Technology

How to escape saddlepoints efficiently

O'Reilly Radar - Wed, 2017/09/20 - 13:00

Michael Jordan discusses recent results in gradient-based optimization for large-scale data analysis.

Continue reading How to escape saddlepoints efficiently.

Categories: Technology

Accelerating AI

O'Reilly Radar - Wed, 2017/09/20 - 13:00

Steve Jurvetson examines the state of artificial intelligence.

Continue reading Accelerating AI.

Categories: Technology

Build smart applications with your new super power: Cloud AI

O'Reilly Radar - Wed, 2017/09/20 - 13:00

Philippe Poutonnet discusses how you can harness the power of machine learning, whether you have a machine learning team of your own or you just want to use machine learning as a service.

Continue reading Build smart applications with your new super power: Cloud AI.

Categories: Technology

AI mimicking nature: Flying and talking

O'Reilly Radar - Wed, 2017/09/20 - 13:00

Lili Cheng shares two examples of AI that were inspired by nature.

Continue reading AI mimicking nature: Flying and talking.

Categories: Technology

Handling checked exceptions in Java streams

O'Reilly Radar - Wed, 2017/09/20 - 11:00

Know your options for managing checked exceptions in Java 8’s functional approach.

Several decisions were made during the creation of the Java language that still impact how we write code today. One of them was the addition of checked exceptions to the language, which the compiler requires you to prepare for with either a try/catch block or a throws clause at compile time.

If you've moved to Java 8, you know that the shift to functional programming concepts like lambda expressions, method references, and streams feels like they completely remade the language.

Continue reading Handling checked exceptions in Java streams.

Categories: Technology

Environmental sensing with recycled materials

O'Reilly Radar - Wed, 2017/09/20 - 04:00

Electronic waste is an economic and environmental problem, but citizen scientists can take action by using harvested sensors from discarded electronics.

Environmental sensing—the process of gathering information from ecological systems—is an essential part of ecology and sustainable agriculture. However, sensors can be expensive and difficult for citizen scientists to obtain, even though their parts are all around us, in the form of technological waste. When a gadget breaks, it is often easier and cheaper to throw it away and purchase a new one than to attempt to repair it. Citizen scientists can take advantage of this unfortunate by-product of "throw away culture" by harvesting the sensor technology that is often found in e-waste. In this article, we discuss an approach to the development of such sensors.

When assessing and addressing environmental issues, especially at a local level, it is more advantageous to involve community members—those who are directly affected by such issues—than scientists and academics. Such an approach has been found to be both faster and more efficient, as dedication amongst local volunteers has been found to be much higher than those with little attachment or stake in the success of the project (Danielsen et al. 2010, 1166–1168). However, oftentimes there is very little funding and resources available to citizen scientists and thus necessitates support from non-local institutes.

There are a number of projects that purport to address exactly this issue. However as we traverse this technological landscape of citizen sensing, although the economic climate is shifting toward affordability, mass distribution of these devices (in detector arrays, for example) is still quite out of the scope of many budgets.

According to Sui and Elwood in “Crowdsourcing Geographic Knowledge”, there exists four levels of participation in citizen science activities. The majority of the projects outlined here fall within the first two levels of engagement; however, this should not be interpreted as an inability for citizen scientists to participate in environmental sensing projects at higher levels. While there are many ecological sensing projects worthy of examination, to list them all would be well outside the scope of this paper; therefore only a select few are outlined.

Figure 1-1. Sui and Elwood propose 4 levels of involvement in citizen scientists ranging from "passive sensors" to "active collaborators" (Sui and Elwood, 2013) Smart Citizen

One of the most polished options for citizen-scientist environmental sensing is the Smart Citizen project. The Smart Citizen Kit is billed as “an Open-Source Environmental Monitoring Platform consisting of arduino-compatible hardware, data visualization web API, and mobile app” (Smart Citizen 2014). It is the result of a crowdfunding effort on Kickstarter by Fab Lab Barcelona at the Institute for Advanced Architecture of Catalonia. The sensor board can measure air composition (CO and NO2), temperature, light intensity, sound levels, and humidity. It is capable of communicating data wirelessly to iOS devices via the Smart Citizen App. The kit itself consists of three boards: the ambient board, which houses the sensors; a data-processing board based on an ATMega32u4; and a Baseboard with USB socket, SD card reader, EEPROM, battery holder, and clock (The Smart Citizen Kit: Crowdsourced Environmental Monitoring 2014). The Smart Citizen Kit places a particular emphasis on large-scale collaboration. Users can register their sensor board on the Smart Citizen website and communicate their local conditions over the web. The result is an international “sensornet” that is openly available for anyone to make use of (Smart Citizen 2014). There are a number of shortcomings with the Smart Citizen, however, especially the price tag — the Smart Citizen kickstarter edition sells for 105 USD unassembled.This sensor would be considered a one or two on Sui and Elwood’s engagement scale because, currently, it only affords citizens the role of data collector and reporter.

Figure 1-2. Smart Citizen sensor board (The Smart Citizen Kit: Crowdsourced Environmental Monitoring, 2014) GardenBot

According to the website, GardenBot “is an open source garden monitoring system” (Frueh 2014). However, GardenBot is more than just a “garden monitoring system”; it is a comprehensive how-to on small-scale environmental monitoring. The bot itself is composed of an Arduino Uno as the “brain,” an LM335 temperature sensor, a small photocell (for monitoring light), and a moisture sensor made of galvanized wire. Furthermore, the system incorporates a water valve as the control for an automated watering system that responds to soil moisture levels. The system, despite being quite simple, has been featured on,, OpenSourceEcology, and

Figure 1-3. Close-up of a part of the gardenbot (Ganapati, 2010)

There are, however, a number of notable drawbacks to this system. Most notably is the fact that it must be connected to a PC. The system works on PC-Arduino serial communication and therefore requires a dedicated PC to run and thus does not allow for the sort of portability that one might hope from an environmental monitoring system. Moreover, because the system incorporates an Arduino board and a number of other components that must be purchased, the setup is not particularly scalable on a tight budget (an Arduino board alone is upwards of 40 USD). Given the rather involved nature of constructing the device—there is no option to order it pre-assembled—any citizen scientist working with it must have a fair bit of aptitude. Therefore, the GardenBot rises above the first and possibly the second level of engagement to level three “participatory science,” at least for the principal stages.

Growerbot (formerly garduino)

The Growerbot, another Arduino-based garden monitor, is billed as a “gamified gardening assistant”. The Growerbot is equipped with a soil probe, a temperature/humidity sensor, and a TSL2561 luminosity sensor. It is also WiFi-enabled with an Electric Imp connectivity platform for connecting to the Internet of Things, thus giving the Growerbot the capacity to share data both locally and over the web. The standard kit also contains an LCD interface and a light/pump controller. The final system is a handsome and well-polished device which is ready to be crowd-sourced.

Figure 1-4. Growerbot—polished, yet expensive The Problem of Electronic Waste

While purchasing commercially available tools for environmental sensing is an attractive option, introducing new technological mass into circulation would be a self-defeating exercise.

In 2012 alone, the average Canadian generated 24.72 kg of electronic waste. This is a staggering fact, given that the average Canadian consumed 28.59 kg of electronic products that same year (StEP 2014). In a year, we dispose of 86% of the electronics we purchase. Many of these items, if not disposed of properly, pose environmental and health hazards. Cathode ray tubes, for example, are known to leach heavy metals such as lead into groundwater. Gold-plated components such as IC’s discharge hydrocarbons and bromide into bodies of water, turning them acidic and killing fish. Plastics from electronics housings emit dioxins and hydrocarbons (Wath et al. 2011). In industrialized countries, these problems are well-managed with recycling and reclamation programs.

However, a good deal of electronic waste is exported to developing countries where there is little infrastructure to deal with it, and so those people are often forced to live alongside hazardous waste (Grossman 2006).

The irony of producing e-waste to develop environmental sensors is not lost on the researcher and it is, therefore, the philosophy of this project—so much as possible— that no new materials should be produced to develop it.

Environmental Sensor Development

The intended final application of this project is an agricultural sensor grid spaced throughout a local crop field. To begin with, we consider three important growing variables for most crops: the amount of light received, the local temperature, and the local soil moisture (i.e., conductivity). Therefore, three separate sensors must be developed: a light-meter, a thermometer, and a conductivity sensor.

Light Meter

A new light meter is around $6 (Adafruit Industries 2014). While this is perfectly sensible for a single meter, scaling into the hundreds gets expensive. However, electronic waste is rife with photoresistors, which can be used as rudimentary luminosity sensors. However, a reading from a photoresistor will not be in the standard measurement of light (lumens), and we therefore must calibrate any photoresistor that we use for this purpose. Calibration can be accomplished by measuring the voltage across a photoresistor, as a light source provides different levels of light,and then comparing that value to a calibrated light meter. Because lab-grade light meters are not readily available, a simple solution is to use the light meter that most smartphone devices have. Using a three-level desk lamp as a light source and Sensor Logger by i-RealitySoft on a Samsung Galaxy Note 2 smartphone, we calibrated a simple photoresistor circuit to respond with acceptable accuracy. To do so, we plotted the average luminosity against the reading from our photoresistor, and we found a nearly linear relationship of lux = 5.3(lx/(unit/3v))*(resistor)-2401.5 lx with an R^2 of 0.97.

Figure 1-5. A deceased Furby with a photoresistor exposed (circled) Figure 1-6. Set-up for calibrating the photoresistor. A 10k pull-down resistor provides a steady ground and an MSP430 reads the voltage values. Figure 1-7. Plotting the average LUX reading against the average photoresistor reading gives us a near linear relationship (R^2 = 0.97) for the few light conditions tested. We find that: lux = 5.3 (lx/(unit/3v))*(resistor)-2401.5 lx Temperature Sensor

The MSP430G2553 microcontroller unit which serves as the “brain” of this project contains an internal temperature sensor that can be easily read via either a serial connection or an attached display. These chips are very common both in industrial applications and in commercial ones such as children’s toys. (Four-Three-Oh! 2012) Before attempting to utilize the internal temperature sensor, an exact idea of how accurate it is must be determined.

The procedure for cross-checking the temperature was simple: a digital house thermometer was placed next to the MSP430 Launchpad Development board, which was in turn connected via serial over USB to a computer where the temperature was read out. Randomly over a week the temperatures read by both devices were recorded and compared.

Figure 1-8. Determining the accuracy of the internal temperature sensor of the MSP430G2553

Over 30 samples recorded during the week, the average percent error was approximately 2.4%. Although this is certainly not lab-quality scientific accuracy, as a makeshift environmental sensor, this small bit of seemingly systematic error is quite acceptable.

Soil Moisture Sensor

To determine the moisture of soil, a pair of galvanized nails are soldered to a pair of wires, with one lead connected to +3.3 V and the other connected via a 10K pull-down resistor and to an analog input on the MSP430. The simple idea behind this setup is that as moisture is added to the soil, it becomes more conductive and thus there is less and less resistance between the sensor nail and the voltage source nail. To test this, three quarters of a cup of potting soil was placed in a cup with the nails spaced a few centimeters apart, and water was added in one-eighth cup increments until the soil was saturated. The values were recorded over the serial port and plotted to examine what constitutes “underwatering” and “overwatering”. It was found that in completely dry soil, the reading is nearly zero. Once the first 1/8 cup of water is added, that value changes quickly to be around 600. Another 1/8 cup brings the value up to a little over 650, and it appears to max out at around 700, whereby the soil is saturated with water.

Figure 1-9. Testing the soil moisture sensor Figure 1-10. A soil moisture reading is taken to determine what sorts of values correspond with dry, under-watered, properly watered, and over-watered plants Demonstrator Build

With the sensors (mostly) working, a final build to enclose the sensors was constructed. The components were removed from the breadboard and soldered directly to a small prototyping PCB. A recycled pill bottle was chosen as a housing, and a 3 V coin-cell battery installed. Holes for the moisture sensor probes and the light sensor were drilled on the bottom and top, respectively. Hot glue served to water-tighten the ports for the sensors.

Figure 1-11. The final build is ultra portable and extremely inexpensive (the MCU was the only nonrecycled part, but was free)

While it is yet to be tested, the final build is meant to be watertight and weather- resistant, with the ability to survive an entire season in a field or garden. Afterward, the capsule can be extracted and the chip removed from the device. The data periodically stored to the nonvolatile flash memory within the MSP430 can then be collected and interpreted. It is intended that the final product will fall within the first level of citizen science involvement, allowing volunteers to passively collect data (literally burying the equipment in the ground and forgetting about it) without needing to concern themselves with the broader work.

Figure 1-12. The final build displaying the internal circuitry Conclusion

Here we described building a sensor system constructed entirely of electronic waste as a way for citizen scientists to participate in mass sensing without spending much money.

Continue reading Environmental sensing with recycled materials.

Categories: Technology

Good research starts with good questions

O'Reilly Radar - Wed, 2017/09/20 - 04:00

How to construct inquiries that will result in good, useful data.

Researchers always struggle when it comes to writing down the questions they need to ask their participants. Sure, this gets easier over time and with experience, but the act of writing an interview guide or test plan never gets “easy.” At the end of the day, we are all human and we are susceptible to our own weaknesses and limitations.

The deck is stacked against us when you start to consider social, personal, professional, and sometimes logistical factors that can inhibit our ability to have a conversation with someone else. Predicting all these factors before research even starts is no small feat. This in turn makes writing down lines of inquiry that will result in good, useful data seem daunting. But you have to start somewhere and iterate as you learn what questions work and which fall flat. To help you with this, first we need to discuss what role questions fulfill when you’re conducting any type of research.

The role of questions in research

It’s hard to conduct research when you don’t know what question needs to be answered. Every research effort starts with you needing to know why something happens, what people do in certain circumstances, and how they perform key tasks. To answer these questions, we must find people to talk to and phrase our questions effectively to get to the heart of the matter. Otherwise, we would be making wild guesses and shooting in the dark. While that’s often tempting, this degree of freedom leads to failure and your product never seeing the light of day.

How good questions go wrong

We can’t tell you how many times we’ve written down a question and thought, “This is it! This will get us some awesome information from people,” only to have it fall flat during a session. This happens to all researchers and it will happen to you. And that’s OK! Bad questions can be mitigated through the planning phase if you know what makes a question go bad. The following factors can lead to misinformed or poor research results.

Leading questions

It’s easy to get caught up in the excitement of research. This can trick you into asking questions that give participants a clue, or directly point them, to the type of answer you’re looking for. These are called leading questions, and they can hinder your research session and the data collected. An example of a leading question would be asking, “How do you use Outlook to communicate your work status?” A better alternative would be “How do you communicate your work status?” The second question allows more responses than leading the participant to describe a specific use of email. Research participants want to be helpful and want to provide value to your team. Since they are primed to help, if you ask a question that implies the type of answer you want, they are more likely to give you that answer, even if it doesn’t really apply to them.

Shallow questions

One golden rule of research is never ask yes/no questions. When creating questions for an upcoming research effort, you’ll find avoiding these questions is hard. Yes/no questions are harmful because they give participants an easy out. The question “Do you use Yammer for team discussions?” can quickly be answered and dismissed. Participants don’t have to think deeply to respond, and they are giving you confirmation that may or may not be useful. A better question is “How do you communicate with your team throughout the day?”

Personal bias

We all have our own beliefs about how products work, or how they should work. These biases can sneak into our questions. The best approach, then, is to remove yourself from the actual research. While strict practice may suggest not conducting the research, we recommend developing questions from the point of view of the product, the customer, or even stakeholders of the product. The less “you” there is in the interview, the better the information that you collect will be. This results in questions more like “Tell me about your experience with your accounting software” than “I know I always struggle with invoices; what challenges do you have with your software?”

Unconscious bias

Our brains make tons of decisions every day, many of which we aren’t aware of. These can be influenced by social norms, personal history, past experiences, or expectations. These biases are the hardest to catch. Unconscious biases fail to recognize that others’ perception of a situation is not the same as our own. To avoid this, dig deeper no matter how uncomfortable that might make you feel. For instance, gender bias exists within the workplace because most people aren’t aware that the bias exists at all. Asking “Where do you guys go to unwind after work?” has implicit gender biases, whereas “Where does your team go after work?” is more neutral.

Knowing when to break the rules

If you’re just starting to build out your research skills, it’s important to avoid the aforementioned factors. However, once you get a few studies under your belt, you’ll find you can use leading questions and shallow questions in strategic ways. You can even use a participant’s personal and unconscious bias to drive to a deeper conversation about how people might use a product.


These are best used when you suspect the response will be opposite to the hints you provide in your questioning. You can use leading questions to help build trust with a participant and to validate a previous comment they made that maybe wasn’t totally clear.

Example: How much do your friends and family appreciate photo albums when you make one for them?


When you start a research session, sometimes participants aren’t yet comfortable and they need to get used to talking with you and answering your questions. Shallow questions give participants that opportunity and can help ease them into the activity so you can get to the good stuff.

Example: How many times do you log in to Facebook in a day?

Personal bias

There is something to be said about being a good devil’s advocate—someone who can take the opposite view in a conversation to spark additional thought or comments. You can use your personal thoughts and opinions to get to deeper conversation by giving the participant something to disagree with.

Example: Do you think the Cubs actually have a chance at the World Series this year?

Years of practice and failure to master

The only way to practice research is by finding people to talk to. The first few studies you run won’t be the best, and that’s OK! You will learn something after each session, even if every question you ask isn’t the best version of that question. The goal is to improve your line of questioning and to find ways to hold a meaningful conversation with someone rather than treat research like a verbal questionnaire.

We have both had our fair share of failing during our years as researchers. In the early days, we asked overly leading questions and missed important areas of discussion because we didn’t know what we were looking for. But thanks to mentors providing feedback and guidance, we eventually overcame these failings. We still make some mistakes today, and you will too, but as long as you have a consistent feedback loop in place, you’ll continue to improve and eventually master the art of research.

Continue reading Good research starts with good questions.

Categories: Technology

Recognizing and evaluating scientific claims in security

O'Reilly Radar - Wed, 2017/09/20 - 03:00

Five questions for Josiah Dykstra on techniques to expose and invalidate misleading claims.

I recently sat down with Josiah Dykstra, Senior Security Researcher at the Department of Defense, to discuss the topics of both accidental and intended misleading communications in security, common pitfalls made in evaluating scientific claims, and the questions you should ask when evaluating scientific claims and third-party vendor solutions.

What are some basic tips for recognizing and understanding scientific claims in security marketing, journalism, or other security-related materials?

People and companies use a variety and spectrum of truly scientific, possibly-scientific, and unscientific statements to talk about products and services. Some are trying to persuade you to buy something, others are simply trying to communicate information. Though scientists themselves can produce misleading and manipulative results, I am generally more concerned about the potential damage caused from seemingly scientific-sounding claims by other sources.

Continue reading Recognizing and evaluating scientific claims in security.

Categories: Technology

You need an Analytics Center of Excellence

O'Reilly Radar - Wed, 2017/09/20 - 03:00

Learn how to add big data to your organization's business processes.

More than 10 years after big data emerged as a new technology paradigm, it is finally in a mature state and its business value throughout most industry sectors is established by a significant number of use cases.

A couple of years ago, the discussion was still about how big data changed our way of capturing, processing, analyzing, and exploiting data in new and meaningful ways for business decision makers. Now many companies undertake analytical projects at a departmental level, redefining the relationship between business and IT by the adoption of Agile and DevOps methodologies. Real-time processing, machine learning algorithms, and even artificial intelligence are the new normal in business talk.

However, companies are still struggling to adopt big data at a corporate level. In many corporations, there is a gap between launching departmental projects and industrializing and scaling-up those use cases across corporations. Embedding big data in scalable business processes is crucial to becoming a data-driven organization. Building an Analytics Center of Excellence (ACoE) can be the basis for this transformation.

Remaining challenges

There are three important issues that must be addressed in order to scale-up big data across a corporation and make a real impact on business outcomes:

1. Lack of skills across the organization.

There is an identified global shortage of analytical talent, a set of data experts ranging from data engineers, big data architects, and data scientists. It is not easy for a company to find these profiles, attract them and retain them. And it gets more difficult as technologies continuously evolve at a challenging, rapid pace. When a company employs these experts, they are not always equally distributed throughout the organization but sometimes concentrated in a particular department or business function (for example, in the risk or marketing departments), making it difficult to leverage these skills for the good of the entire organization. If a company has multiple locations, it is even harder to keep the right balance of skills in all the subsidiaries.

The shortage of skills affects the technical or analytical departments as well as the business areas. Companies need subject matter experts who understand business needs to communicate with the data experts, as well as managers able to make decisions based on data and supported by facts more than by personal, biased experience. Furthermore, the new skills require new ways of working and therefore an organizational cultural change.

2. Lack of standards, methodology, and governance

Even mature organizations with analytics teams in place in different departments, business units, or countries find that every team tends to work with their own tools, libraries, software versions, and data sets. This variety can make it difficult to industrialize and implement global solutions, and ensure code reusability. Companies need to define standards regarding coding, tools, version control, and quality control, and have all the teams working with the same tools and sharing their methodologies. Additionally, analytics teams must have big data governance policies and processes in place, controlling and limiting the access of data in the data lake and ensuring security and data privacy controls. In Europe, for instance, a new General Data Protection Regulation (GDPR) requires a very demanding process regarding data traceability at a field level. Data is a key asset, and companies will be required to protect it. big data governance is a must not only because of legislation, but in order to secure a business’ reputation for security and privacy.

3. Lack of use cases prioritization

Big data must be led by business, but often companies face organizational issues as a result of the lack of use case prioritization. Although the companies are willing to implement big data projects in various business areas, the projects are usually planned according to the big data architecture roadmap or the data provisioning strategy defined by IT, instead of prioritized according to business impact. When there are several departments pushing for a centralized big data budget, a business-oriented, ROI-driven approach is needed to define the use case roadmap in order to maximize the impact the entire organization.

Accelerating big data adoption by business

These three challenges are not an excuse for failing to adopt big data solutions. The most effective way to create mechanisms to deploy big data across the entire organization in a systematic and scalable way is to launch an ACoE.

Although the concept may be familiar, the element of ACoE we’re focusing on here is not that of an “algorithms factory” or “a technical team of product specialists.” Instead, an ACoE should consist of a team of business and technical people with centralized and distributed capabilities and resources, working in big data advanced analytics projects and creating a common workspace in which methodologies, tools, models, and techniques are shared in order to gain efficiency when implementing the initiatives across different business units and markets.

The operational model to make an ACoE successful is not obvious. Here are some of the key principles of an ACoE.

First, it must be connected with the business. The ACoE must include a team of business experts able to align the business strategy of the company through the prioritization of use cases and coordinate the implications at a technology, architectural analytics, and governance level with the different stakeholders.

Second, it has to be able to grow organically; as the company expands and scales-up new big data projects across the corporation, more people and teams will join. The AcoE hosts what we call the core team—a centralized function—while the expansion is through the extended teams, a distributed function across geographies. The core team shares and expands best practices and methodologies, accelerating know-how transfer.

Another principle is that it should be able to deploy as a service. The ACoE must be able to deploy as a service for the business units as a shared services operation, a fully outsourced operation, or in an extended capability, allowing infrastructure and resources rationalization, scalability, and elasticity. ACoE costs should be allocated as a service to the business units.

Key benefits of an ACoE

An ACoE is essential to accelerate big data adoption by business at scale. It reduces the implementation times drastically and therefore the time-to-market to deploy new data-driven products and services. It ensures best practices and methodologies are shared through different teams in the organization. An ACoE is alive, and it expands and it grows as the organization’s needs evolves, a key factor for realizing the business value of big data.

Data is the key competitive advantage and differentiating factor for companies in any industry. A true data-driven organization understands that data is at the center of any business strategy. Companies must be able to use data not only to improve decision-making and operational efficiency, but they must have the capacity to create new products and processes based on data-driven insights. In order to do so, companies must embed the necessary organizational and cultural changes at a corporate level that it takes to succeed. An Analytics Center of Excellence is an important tool for accomplishing these goals.

Continue reading You need an Analytics Center of Excellence.

Categories: Technology

Four short links: 20 September 2017

O'Reilly Radar - Wed, 2017/09/20 - 01:00

AI Needs Ethics, Automotive-Grade Linux, Drawing Clocks, and Facial Recognition

  1. AI Research Needs an Ethical Watchdog (Wired) -- Right now, if government-funded scientists want to research humans for a study, the law requires them to get the approval of an ethics committee known as an institutional review board, or IRB. Stanford’s review board approved Kosinski and Wang’s study. But these boards use rules developed 40 years ago for protecting people during real-life interactions, such as drawing blood or conducting interviews. “The regulations were designed for a very specific type of research harm and a specific set of research methods that simply don’t hold for data science,” says Metcalf.
  2. Automotive-Grade Linux Debuts On The 2018 Toyota Camry -- you heard it here first: 2018 is the year of the Linux hatchback. You heard it here first!
  3. Clocks for Software Engineers -- The first and perhaps most difficult part of learning hardware design is to learn that all hardware design is parallel design. Things don’t take place serially, as in one instruction after another ... like they do in a computer. Instead, everything happens at once.
  4. Facial Recognition is Here to Stay -- I have to admit that when I saw facial recognition improving, and realised it'd be useful in a few years, I never imagined the use case would be "so the cashier at Chik-Fil-A would know your name."

Continue reading Four short links: 20 September 2017.

Categories: Technology

The state of AI adoption

O'Reilly Radar - Tue, 2017/09/19 - 13:00

AI Conference chairs Ben Lorica and Roger Chen reveal the current AI trends they've observed in industry.

Continue reading The state of AI adoption.

Categories: Technology

AI is the new electricity

O'Reilly Radar - Tue, 2017/09/19 - 13:00

Andrew Ng shares his thoughts on where the biggest opportunities in AI may lie.

Continue reading AI is the new electricity.

Categories: Technology


Subscribe to LuftHans aggregator - Technology