You are here

Feed aggregator

Four short links: 16 Sep 2020

O'Reilly Radar - Wed, 2020/09/16 - 03:58
  1. A Concurrency Cost Hierarchya higher level taxonomy that I use to think about concurrent performance. We’ll group the performance of concurrent operations into six broad levels running from fast to slow, with each level differing from its neighbors by roughly an order of magnitude in performance. They are: Contended Atomics, System Calls, Implied Context Switch, Catastrophe, Uncontended Atomics, Vanilla Instructions.
  2. Open Source Quadruped Robot — Now with a robotic arm.
  3. AI Ethics Groupswithout more geographic representation, they’ll produce a global vision for AI ethics that reflects the perspectives of people in only a few regions of the world, particularly North America and northwestern Europe. […] This lack of regional diversity reflects the current concentration of AI research (pdf): 86% of papers published at AI conferences in 2018 were attributed to authors in East Asia, North America, or Europe. And fewer than 10% of references listed in AI papers published in these regions are to papers from another region. Patents are also highly concentrated: 51% of AI patents published in 2018 were attributed to North America.
  4. Threat Models for Differential Privacy — Looks at risks around central, local, and hybrid models of differential privacy. Good insight and useful conclusions, e.g. As a result, the local model is only useful for queries with a very strong “signal.” Apple’s system, for example, uses the local model to estimate the popularity of emojis, but the results are only useful for the most popular emojis (i.e. where the “signal” is strongest). The local model is typically not used for more complex queries, like those used in the U.S. Census [3] or applications like machine learning.
Categories: Technology

How to Set AI Goals

O'Reilly Radar - Tue, 2020/09/15 - 04:47
AI Benefits and Stakeholders

AI is a field where value, in the form of outcomes and their resulting benefits, is created by machines exhibiting the ability to learn and “understand,” and to use the knowledge learned to carry out tasks or achieve goals. AI-generated benefits can be realized by defining and achieving appropriate goals. These goals depend on who the stakeholder is; in other words, the person or company receiving the benefits.

There are three potential stakeholders for AI applications, with a single application often involving all three. They are business stakeholders, customers, and users. Each type of stakeholder has different and unique goals; each group is most interested in having their specific objectives met, or problems solved. My book, AI for People and Business, introduces a framework that highlights the fact that both people and businesses can benefit from AI in unique and different ways.

A typical social media platform needs to satisfy all three stakeholders. In the case of Twitter, the business stakeholder’s top goals are likely centered around profits and revenue growth. Customer stakeholders are the people and companies that advertise on the platform, and are most concerned with ROI on their ad spend. User stakeholders are interested in benefiting from the platform’s functionality: staying up-to-date, quickly finding new people and topics to follow, and engaging with family and friends.

Goals should be defined specifically and at a granular level for each stakeholder and relevant use case. Twitter has no doubt went through this exercise long ago; but if we imagine Twitter taking its first steps towards AI, some specific and granular goals could be to build a recommendation engine that helps users find the most relevant people to follow (a goal for users), while also building an AI-powered advertising targeting engine that best matches ads with those most likely to be interested in the product or service being advertised (for customers). This in turn would increase the platform’s value for users and thus increase engagement, which would result in more eyes to see and interact with ads, which would mean better ROI on ad spend for customers, which would then achieve the goal of increased revenue and customer retention (for business stakeholders). The key is to start with small and easily identifiable AI projects that will trickle value upwards towards a company’s highest priority goals.

AI Goals as a Function of Maturity

For companies early in their AI journey, setting appropriate goals helps create a foundation from which to build AI maturity. It also helps companies learn how to translate existing AI capabilities into solving specific real-world problems and use cases. In my book, I introduce the Technical Maturity Model:

I define technical maturity as a combination of three factors at a given point of time. These factors are:

  • Experience: More experience usually results in increased muscle memory, faster progress, and greater efficiency. Teams with more experience with techniques such as natural language processing and computer vision are more likely to be successful building new applications using the same techniques. They’re not new to the field; they’ve solved problems, and have discovered what does and doesn’t work.
  • Technical sophistication: Sophistication measures a team’s ability to use advanced tools and techniques (e.g., PyTorch, TensorFlow, reinforcement learning, self-supervised learning). When new tools appear, they can decide quickly whether they’re worth while, and get up to speed. They’re on top of the research, and are capable of evaluating and experimenting with new ideas.
  • Technical competence: Competence measures a team’s ability to successfully deliver on initiatives and projects.  They have previously built similar, successful AI applications, and are thus highly confident and relatively accurate in estimating the time, effort, and cost required to deliver again. Technical competence results in reduced risk and uncertainty.

There’s a lot of overlap between these factors.  Defining them precisely isn’t as important as the fact that you need all three. Higher levels of experience, technical sophistication, and technical competence increase technical maturity. Increased AI technical maturity boosts certainty and confidence, which in turn, results in better and more efficient AI-powered outcomes and success.

Technical maturity is a major factor behind why some companies are very successful with AI, while other companies struggle to get started and/or achieve success.

The Challenge with Defining AI Goals

Turning an AI idea into actual benefits is difficult and requires the “right” goals, leadership, expertise, and approach. It also requires buy-in and alignment at the C-level.

Identifying, prioritizing, and goal-setting for AI opportunities is a multi-functional team effort that should include business folks, domain experts, and AI practitioners and researchers. This helps ensure alignment with company goals, while also including necessary business and domain expertise. AI initiatives may also require significant considerations for governance, compliance, ethics, cost, and risk.

Further, while the technical details of AI are complex, the outputs of AI techniques are relatively simple. In most cases, AI solutions are built to map a set of inputs to one or more outputs, where the outputs fall into a small group of possibilities. Outputs from trained AI models include numbers (continuous or discrete), categories or classes (e.g., spam or not-spam), probabilities, groups/segments, or a sequence (e.g., characters, words, or sentences).

Therefore, AI techniques don’t just solve real-world problems out of the box. They don’t automatically generate revenue and growth, maximize ROI, or keep users engaged and loyal. Likewise, AI doesn’t inherently optimize supply chains, detect diseases, drive cars, augment human intelligence, or tailor promotions to different market segments.

Setting a company-wide goal of reducing customer churn by 25% is great, but, unfortunately, is far too broad for most AI applications. That’s why customer churn reduction is not a natural output of AI techniques. The mismatch between goals like reducing customer churn and actual AI outputs must be properly handled and mapped.

Why and How to Set Good AI Goals

AI goals should be appropriate for a given company’s technical maturity, and should be chosen to maximize the likelihood of success, prove value, and build a foundation from which to create increasingly sophisticated AI solutions that achieve higher-level business goals. A crawl, walk, run approach is a good analogy for this.

Goals should be well-formed, meaning they are stakeholder-specific, map actual AI outputs to applications and use cases that achieve business goals, and are appropriately sized. For companies early in their AI maturity, appropriately-sized goals mean that they should be small and specific enough to experiment with, and prove potential value from, relatively quickly (think lean methodologies and incremental). As AI maturity increases, a non-incremental, holistic, and organization-wide AI vision and strategy should be created to achieve hierarchically-aligned AI goals of varying granularity—goals that drive all AI initiatives and development. This should be accompanied by a transition from incremental thinking to big vision, “applied AI transformation” thinking.

Let’s consider the overall goal of reducing customer churn. In an early stage of AI maturity, we can build AI solutions that reduce search friction (e.g., Netflix and Amazon recommendation engines), increase stickiness through personalized promotions and content that is more relevant and engaging, create a predictive model to identify customers most likely to churn and take appropriate preventative actions, or automate and optimize results in areas that are outside of a person’s primary area of expertise (e.g., automated retirement portfolio rebalancing and maximized ROI). When transitioning to developing a bigger AI vision and strategy, we may create a prioritized product roadmap consisting of a suite of recommendation engines and an AI-based personalized loyalty program, for example.

At the individual goal level, and for each well-formed goal, the same multi-functional team mentioned earlier must work collaboratively to determine what AI opportunities are available, select and prioritize the ones to pursue, and determine the technical feasibility of each.

There are frameworks like SMART to help characterize well-formed goals, but since AI is a field that I characterize as scientific innovation (like R&D), characteristics like being achievable and time-bound may not be the best goals. Results are typically achieved through a scientific process of discovery, exploration, and experimentation, and these processes are not always predictable.

Given the scientific nature of AI, goals are better expressed as well-posed questions and hypotheses around a specific and intended benefit or outcome for a certain stakeholder. With well-formed goals, data scientists and machine learning engineers can then apply the scientific method to test different approaches in order to determine the validity of the hypothesis, and assess whether a given approach is feasible and can achieve the goal.

For example, by introducing the “Frequently bought together” recommendations (and other recommendations), Amazon was able to increase average customer shopping cart size and order amount (i.e., up-sell and cross-sell), which in turn increases average revenue per customer, which in turn increases Amazon’s e-commerce generated revenue per quarter. McKinsey estimates that up to 35% of Amazon’s revenue and 75% of everything watched on Netflix comes from AI-powered recommendations.

But when defining an AI project, the goal or hypothesis in this case isn’t to increase top-line revenue for the company, but rather to posit that building an application that groups products by likelihood to be purchased together will increase average customer order size, which in turn will have an upward impact on top level goals like increasing average revenue per customer and top-line revenue.

Another example would be setting a goal around building a well-performing AI model that can predict demand (number of units likely to be purchased) for a specific product for a given day, time, and weather conditions. If accurate, this prediction can help a retailer ensure that they do not run out of stock, which means that there is no lost revenue because a product is out of stock. An added benefit is improved customer experience, which results in happier and more loyal customers who are able to buy the products they want whenever they want to buy it. This same approach can be applied to virtually any other application of AI.

Conclusion

AI and machine learning technologies have come a long way in terms of capabilities and accessibility, but off-the-shelf AI solutions aren’t yet available for specific industries or business domains, companies, sets of data, applications, and use cases. The key to success with AI is assembling a multi-functional team that defines appropriate goals, then letting these goals drive the AI initiatives and projects.

Categories: Technology

Four short links: 11 Sep 2020

O'Reilly Radar - Fri, 2020/09/11 - 04:21
  1. Accurately Lipsync Video to Any SpeechIn our paper, A Lip Sync Expert Is All You Need for Speech to Lip Generation In the Wild, ACM Multimedia 2020, we aim to lip-sync unconstrained videos in the wild to any desired target speech. (Paper) Impressive.
  2. TemporalOpen source “workflow-as-code” engine. I can’t decide if this is awful or brilliant.
  3. The Rate of Obsolescence of Knowledge in Software Engineering — When I graduated, I was told the half-life for what I’d learned was 18 months. But not all knowledge is equivalent, as the author points out. The anatomy of an OS class is still relevant today. It’s interesting to look at the knowledge you struggle to acquire and ask yourself what its half-life will be.
  4. Security Engineering (3ed) — Drafts of the third edition, to be released in December, are available online but may go away. (via Bruce Schneier)
Categories: Technology

September 10th Virtual Meeting

PLUG - Wed, 2020/09/09 - 16:47

We have a couple of presentations lined up this month for you to enjoy from the safety of your home.

Attend by going to: https://lufthans.bigbluemeeting.com/b/plu-yuk-7xx

Adrian Cochrane: What is the small web And Why Is It Important?

Description:
To support the creation of "webapps" web browsers have become increasingly complex, to the extent that even major corporations can no longer continue keeping up.

There's been a growing community concerned about this direction the web's been taking and wanting to do something about. This talk will show that this is possible and vital to do.

About Adrian:
Adrian started programming when he was ten years old, when he'd sleep with a Python book under his pillow. Since then he continued studying programming eventually graduating from Victoria University of Wellington with a BSc Computer Science, now running a contracting business with his father supporting the establishment of open standards.

He started developing his own browser to explore an increasing fascination with how to (re)discover valuable webpages, occasionally contributing code to related projects. And frequently studying all the code he can about how elementary OS works.


Kevin Tyers: BASH - Practical Tips

Description:
Kevin will be sharing 6 of his most beloved tricks for using Bash. These simple, yet practical tips should be immediately useful, or at least inspirational for you to develop your own set. This talk is for people of all skill levels.

About Kevin:
Kevin Tyers is a SANS Instructor, the head of cyber intelligence engineering for a Fortune 250 company, and the head of infrastructure for iCTF. Throughout his 15-year career, he has worked in the government, telecom, health care, and financial industries focusing on network engineering/security, incident response, and tooling. Kevin is the cofounder of the Information Security group DC480 in Phoenix Arizona. He has spoken at a variety of public and invite-only conferences such as BSidesLV, CactusCon, and SANS Hackfest. He has been a Linux user for as long as he can remember and is passionate about sharing the tips and tricks he has learned for using Linux.

Four short links: 9 Sep 2020

O'Reilly Radar - Wed, 2020/09/09 - 04:02
  1. Things I Learned to Become a Senior Software Engineer — Full of relatable growth moments, such as changing your code to make the test pass vs understanding why the test failed.
  2. The Future is Software Engineers Who Can’t Code“There are lot of definitions of what a developer is […] It’s not just people who write code.” […] Microsoft has even given these “civilian” programmers a persona: Mort. […] The fictional “Mort” is a skilled professional, anyone from a business analyst to a construction site cost estimator, who needs computers to perform specific functions without mastering the intricacies of full-blown programming. As Mel Conway called it, the profession is bifurcating into architects and plumbers. Architects make complex pieces of software, plumbers bolt those pieces together.
  3. Pair Programming with AI — This makes sense to me: We don’t need complete, perfect solutions; we need partial solutions in situations where we don’t have all the information, and we need the ability to explore those solutions with an (artificially) intelligent partner.
  4. Writing System Software: Code Comments — Absolutely the best thing on software engineering that a software engineer will read all month. This is GOLD.
Categories: Technology

Pair Programming with AI

O'Reilly Radar - Tue, 2020/09/08 - 05:41

In a conversation with Kevlin Henney, we started talking about the kinds of user interfaces that might work for AI-assisted programming. This is a significant problem: neither of us were aware of any significant work on user interfaces that support collaboration. However, as software developers, many of us have been practicing effective collaboration for years. It’s called pair programming, and it’s not at all like the models we’ve seen for interaction between an AI system and a human.

Most AI systems we’ve seen envision AI as an oracle: you give it the input, it pops out the answer. It’s a unidirectional flow from the source to the destination. This model has many problems; for example, one reason medical doctors have been slow to accept AI may be that it’s good at giving you the obvious solution (“that rash is poison ivy”), at which point the doctor says “I knew that…” Or it gives you a different solution, to which the doctor says “That’s wrong.” Doctors worry that AI will “derail clinicians’ conversations with patients,” hinting that oracles are unwelcome in the exam room (unless they’re human).

Shortly after IBM’s Watson beat the world Jeopardy champions, IBM invited me to see a presentation about it. For me, the most interesting part wasn’t the Jeopardy game Watson played against some IBM employees; it was when they showed the set of answers Watson considered before selecting its answer, weighted with their probabilities. That level was the real gold.  We don’t need an AI system to tell us something obvious, or something we can Google in a matter of seconds. We need AI when the obvious answer isn’t the right one, and one of the possible but rejected answers is.

What we really need is the ability to have a dialog with the machine. We still don’t have the user interface for that. We don’t need complete, perfect solutions; we need partial solutions in situations where we don’t have all the information, and we need the ability to explore those solutions with an (artificially) intelligent partner. What is the logic behind the second, third, fourth, and fifth solutions? If we know the most likely solution is wrong, what’s next? Life is not like a game of Chess or Go—or, for that matter, Jeopardy.

What would this look like? One of the most important contributions of Extreme Programming and other Agile approaches was that they weren’t unidirectional. These methodologies stressed iteration: building something useful, demo-ing it to the customer, taking feedback, and then improving. Compared to Waterfall, Agile gave up on the master plan and specification that governed the project’s shape from start to finish, in favor of many mid-course corrections.

That cyclic process, which is about collaboration between software developers and customers, may be exactly what we need to get beyond the “AI as Oracle” interaction. We don’t need a prescriptive AI writing code; we need a round trip, in which the AI makes suggestions, the programmer refines those suggestions, and together, they work towards a solution.

That solution is probably embedded in an IDE. Programmers might start with a rough description of what they want to do, in an imprecise, ambiguous language like English. The AI could respond with a sketch of what the solution might look like, possibly in pseudo-code. The programmer could then continue by filling in the actual code, possibly with extensive code completion (and yes, based on a model trained on all the code in GitHub or whatever). At this point, the IDE could translate the programmer’s code back into pseudo-code, using a tool like  Pseudogen (a promising new tool, though still experimental). Any writer, whether of prose or of code, knows that having someone tell you what they think you meant does wonders for revealing your own lapses in understanding.

MISIM is another research project that envisions a collaborative role for AI.  It watches the code that a developer is writing, extracting its meaning and comparing it with similar code.  It then makes suggestions about rewriting code that looks buggy or inefficient, based on the structure of similar programs. Although its creators suggest that MISIM could eventually lead to machines that program themselves, that’s not what interests me; I’m more interested in the idea that MISIM is helping a human to write better code. AI is still not very good at detecting and fixing bugs; but it is very good at asking a programmer to think carefully when it looks like something is wrong.

Is this pair programming with a machine? Maybe.  It is definitely enlisting the machine as a collaborator, rather than as a surrogate. The goal isn’t to replace the programmers, but to make programmers better, to work in ways that are faster and more effective.

Will it work? I don’t know; we haven’t built anything like it yet. It’s time to try.

Categories: Technology

Four short links: 4 September 2020

O'Reilly Radar - Fri, 2020/09/04 - 03:40
  1. Inside the Digital Pregnancy Test — … is a paper pregnancy test and watch-battery-powered microcontroller connected to three LEDs, a photo-cell, and an LCD display. That (8-bit) microcontroller runs at 4MHz, almost as fast as an IBM PC did.
  2. The Incredible Proof Machine — Fun game (modelled on The Incredible Machine from the 90s) that teaches logic.
  3. Make Interfaces Hard to MisuseDon’t push the responsibility of maintaining invariants required by your class on to its callers. Excellent advice.
  4. ArangoDB a scalable open-source multi-model database natively supporting graph, document and search. All supported data models & access patterns can be comnbined in queries allowing for maximal flexibility.
Categories: Technology

Four short links: 2 September 2020

O'Reilly Radar - Wed, 2020/09/02 - 04:15
  1. VSCode Debug VisualizerA VS Code extension for visualizing data structures while debugging. Like the VS Code’s watch view, but with rich visualizations of the watched value. The screencast is wow.
  2. Userlandan integrated dataflow environment for end-users. It allows users to interact with modules that implement functionality for different domains from a single user interface and combine these modules in creative ways. The talk shows it in action. It’s a spreadsheet and cells can be like a spreadsheet, or can be like a Unix shell, or can be an audio synthesizer (!).
  3. Minglr — Open source software (built on Jitsi) that facilitates the ad hoc mingling that might happen in the audience after a talk ends: see who’s there, pick people to talk to, talk to them. Interesting to see the floresence of social software thanks to lockdown.
  4. Crepea library that allows you to write declarative logic programs in Rust, with a Datalog-like syntax. The Reachable example is sweet. From initial testing, the generated code is very fast. Variants of transitive closure for large graphs (~1000 nodes) run at comparable speed to compiled Souffle, and at a fraction of the compilation time.
Categories: Technology

Radar trends to watch: September 2020

O'Reilly Radar - Tue, 2020/09/01 - 04:57

Compared to the last few months, there are relatively few items about COVID. And almost no items about Blockchains, though the one item I’ve listed, about China’s Blockchain Services Network, may be the most important item here. I’m seeing a steady stream of articles about various forms of no-code/low-code programming. While many programmers scoff at the idea of programming-without-programming, spreadsheets are an early example of low-code programing. Excel is hardly insignificant. On the AI front, the most significant change is discussion (see the thread below) of a “Deep Learning Recession,” as companies under pressure from COVID look for results and can’t find them.

AI
  • There is serious talk of a “Deep Learning recession” due, among other things, to a collapse in job postings. Short-term effect of COVID or long term trend?
  • An excellent analysis of participation in machine learning: how it is used, and how it could be used to build fair systems and mitigate power imbalances.
  • Fairness and Machine Learning is an important new (academic) book by Solon Barocas, Moritz Hardt, and Arvind Narayanan. It’s currently an incomplete draft, available (free) online.
  • A draft document from NIST describes Four Principles of Explainable Artificial Intelligence. The ability to explain decisions made by AI systems is already important, and will become more so.
  • SAIL-ON is a DARPA-funded research project to develop AI systems that can deal with novelty (such as the COVID pandemic), starting with unexpected situations (such as changes to the rules) in board games.
  • Is NLP research pursuing the right goals? While GPT-3 is impressive, it doesn’t demonstrate anything like comprehension (ordering the relationships found in a story). (David Ferrucci, Elemental Cognition). Likewise, Gary Marcus argues that GPT-3 can put together sentences that sound good, but that it has no idea what it’s talking about.
  • What happens when you combine a relational database with git? You get a Dolt, a database that enables collaboration on datasets.  You might get a solution to the problem of data versioning–and a big step towards CI/CD pipelines for AI applications.
COVID-19
  • Cough recognition? AI to locate people who cough in space and time. A very questionable tool for pandemic fighting.
  • Patient-led research on COVID-19 is an organization to help long-term COVID patients share their observations.  This is reminiscent of PatientsLikeMe, and related to other trends in re-envisioning healthcare.
Programming
  • Turning a Google Sheet into an app without code is yet another example of the low-code trend.
  • Chris Lattner (one of the co-creators of LLVM and of Swift) has an interesting AMA that, among other things, talks about integrating machine learning with compilers, and machine learning as its own programming paradigm.
  • Contextual engineering is fundamentally simple: consider “why” before thinking about “how.”  But ignoring “why” is at the heart of so many engineering failures throughout history. And understanding “why” often requires “immersion in the local culture.”  This is starting to sound like an extended version of bounded context.
  • Blazor is a new framework for the Web that allows developers to program with C#, compiling to Web Assembly (wasm).  It’s potentially a competitor for JavaScript and React/Angular/Vue, though it may have trouble spreading outside of the Microsoft community.
Cloud and Microservices
  • K3s is a stripped-down Kubernetes designed (among other things) for IoT and Edge Computing.  I’ve thought for some time that Kubernetes needs simplification. Is this it?
  • Microsoft announces Open Service Mesh for managing communications between microservices. OSM is based on the Service Mesh Interface, and is an alternative to Google’s Istio, which has a reputation for being difficult, and has become controversial.
  • SMOKEstack is Redmonk’s “alternative stack” for multi-cloud environments (and perhaps doing an end-run around Amazon’s hegemony). SMOKE stands for Serviceful, Mashable, Open, K(C)composable, Event-driven.
  • IBM’s RoboRXN is a “chemistry lab in a cloud” that’s designed for drug synthesis; you design an experiment, which executes in a robotic lab. The idea is similar to Max Hodak’s Transcriptic (now Strateos; Max is now founder and president of Elon Musk’s Neuralink).  But IBM adds some twists: you design a molecule with a graphical (low-code) interface, and the actual process is filled in using AI.
Quantum Computing
  • Amazon’s Braket service: true quantum computing in the cloud, and now available to the public. IBM and Microsoft already have quantum computers available through their cloud offerings; Google will eventually follow. We’re still at the tire kicking stage, since none of these machines can do real work yet.
  • NIST has announced a number of cryptographic algorithms that can’t currently be broken by quantum computers. This is a significant step towards quantum-proof encryption.
New Infrastructure
  • TRUSTS is a data market funded by the European Commission.  MIT Tech Review has a good explanation. It’s a little surprising to see this coming out of the EU, but once people have the right to privacy, the right to sell data is not far behind. Individuals don’t participate in the market as individuals; the trust handles (and enforces) the transactions and pays dividends.
  • One of the biggest problems with privacy and identity has been developing the infrastructure for managing public keys. Sidetree is a protocol for decentralized public key infrastructure based on a blockchain.
  • China’s blockchain infrastructure (BSN) supplies the infrastructure for a global financial services network.  It is not a blockchain, but a network for building interoperable blockchain applications that is targeted at small to medium businesses, with the intent of making RMB a viable global currency.  The US Federal Reserve has also released plans for cryptocurrency: They are planning to launch a service called FedNow in 2-3 years. They are way behind the Chinese.
Operations
  • A pre-release of a paper describes zero-downtime deployments at Facebook scale. There’s a good thread on Twitter discussing the paper.
Social Media
  • An algorithm for controlling fairness and bias in search results rations exposure, preventing a popular link from gaining clicks relative to similar links, while still maximizing usefulness.
  • Twitter has released their new API. After abusing their developers a decade ago, will this make any difference?
  • In iOS 14, Apple will be requiring opt-in for tracking users’ web activity. Facebook is not happy about this; targeted advertising depends critically on user tracking. Google (which has been gradually implementing other limitations on advertising technology) has been quiet about it.
Categories: Technology

Four short links: 28 August 2020

O'Reilly Radar - Fri, 2020/08/28 - 04:17
  1. Activity Watch — Open source, privacy-first, cross-platform app that automatically tracks how you spend time on your devices.
  2. Natural Language Database Queries — An interesting comment thread on Hacker News. Sample comments: I’ve done some previous digging into natural language SQL queries — there’s a good amount of research around this. But the error rate is always too high for useful production deployment and the systems I’ve looked at never handled ambiguity well. The target user for this is someone who knows nothing about SQL so ambiguity is guaranteed. and I worked for Cleargraph, a startup which built a natural language query layer on top of RDBMSes and which we sold to Tableau. We reached the same conclusions as you: that such a system must properly handle ambiguous queries, and users need to explicitly understand the results they’re viewing.
  3. Hands-On Web Assembly — Simple tutorial for Wasm.
  4. ArwesFuturistic Sci-Fi and Cyberpunk Graphical User Interface Framework for Web Apps. Click through, it’s worth it.
Categories: Technology

An Agent of Change

O'Reilly Radar - Tue, 2020/08/25 - 04:38

The Covid-19 pandemic has changed how people and businesses spend and operate.  Over the coming pages we’ll explore ways in which our current world is already very different from the one we knew just a few months ago, as well as predictions of our “new normal” once the proverbial boat stops rocking.  Specifically, we’ll see this through the lens of decision-making: how has Covid-19 changed the way we think? And what does this mean for our purchase patterns and business models?

Welcome to Uncertainty

You’re used to a certain level of uncertainty in your life, sure.  But the pandemic has quickly turned up the uncertainty on even basic planning.

Your dishwasher, piano, or clothes dryer is making an odd sound. Do you proactively call a repair service to check it out?  Your ounce of prevention will also cost you two weeks’ wondering whether the repair technician was an asymptomatic carrier.  If you hold off, you’re placing a bet that the appliance lasts long enough for treatment to become widely available, because you certainly don’t want it to break down just as infection rates spike.

Stresses on a system reveal that some of our constants were really variables in disguise.  “I can always leave my house.”  “I can get to the gym on Friday.”  “If I don’t go grocery shopping tonight, I can always do it tomorrow.  It’s not like they’ll run out of food.”  These weren’t exactly bold statements in January.   But by March, many cities’ shelter-in-place orders had turned those periods into question marks. Even as cities are starting to relax those restrictions, there’s the worry that they may suddenly return as the virus continues to spread.

As this reality sets in, some of us are even weighing what we call “acceptance purchases”: items which show that we’re in this for the long haul.  Your gym isn’t closed, but it’s as good as closed since the city can quickly order it to shut down if local case counts climb again.  So maybe it’s time to buy that fancy exercise bike.  And ride-hailing services were appealing until using them increased your exposure to the virus.  Maybe now you’ll buy that car you sometimes think about?  You had considered downsizing your home, but you’ll appreciate the extra space if you’re spending more time indoors.

Those sorts of purchases are meant to last you for years, though, which means they’re only wise investments if the pandemic (and its impact on the local economy) continues for a long time.  What if we see improved prevention or widespread treatment within a few months?  Do we want to try to offload an exercise bike or a car that we no longer need? The longer you hold off on making those decisions, the greater the chances that you’ll make those purchases too late.

It’s tough to make a decision when you can’t rely on your near-term world falling within some certain, predictable scope.  You try to keep all of your options open at all times so that you can be ready for any possibility.  But that’s a lot of extra strain on your brain.  And it’s tiring.

Your house just got smaller

The pandemic has made a number of businesses less desirable or outright inaccessible.  Doing that work yourself reduces the impact around the uncertainty of when they’ll return.  It’s also a lot more responsibility for you.

Congratulations on running a restaurant, cafe, bar, cinema, gym, school, daycare, office, and storage facility.  You get to buy workout equipment, cooking gear, hair care tools, teaching supplies, and anything else needed to backfill services to which you used to outsource. You’re responsible for the decisions on which models of equipment to buy, as well as the upkeep thereof. You suddenly need to know a lot of things about a lot of things, but you don’t have the time to become an expert in any one of them.

Welcome to the diseconomies of non-scale: being small and self-sufficient is expensive.

Like a factory, hotel event space, or a fast-food kitchen, you find yourself constantly partitioning or re-tooling rooms to compensate for your limited space.  The gym becomes the reading room becomes the video meeting space becomes the work area.  (You and your spouse flip a coin to see who gets the real office and who gets that basement corner.  Hint: if you want the nicer space, make sure you’re on more video calls. And pretend that you can’t get Zoom backgrounds to work.) The kitchen table flips between a dining area and a school, three times a day.  The bathroom becomes the hair salon every week and quickly switches back again.  All of these swaps take time, effort, and mental energy, what economists collectively refer to as switching costs.  They add up. Quickly.

Nor is this just about the size of the home.  It’s a matter of how much you were using it before the pandemic started, the number of spaces present, and the ratio of people to square feet.  If you live in a sprawling, suburban house, but every room was already dedicated to some function or someone’s personal space, then you’re just marginally better off than the person occupying a small, urban apartment.

This isn’t just about “working from home,” either.  That usually means that you have a space set aside in your house to work and to take calls, and you have the house to yourself during working hours. Experienced remote-work professionals will tell you that we’re living a very different scenario.  This increasing need to run a standalone, be-everything home means that we are suffering from the curse of generalism: we’re our children’s teachers, our cooks, our housekeepers, our barbers, our IT department, and our fix-it crew.  We’re becoming more self-sufficient, but at the expense of having less time to specialize in our main jobs. All of this means that we’re spending a lot of time just getting by, and not much time advancing.

Connections as Currency: Getting By and Getting Supplies

In most places, you don’t need to be “connected” to get by day-to-day.  Whom you know is less important because, with even a modest income, you can get most of what you need.  You mostly care about diversifying your professional network, because that helps you to find a new job, which is what provides you the money that allows you to compensate for not knowing anyone else.

When a pandemic triggers stresses in our supply chains, that idea breaks down.  Whom you  know in your personal sphere suddenly counts a lot more.  Instead of money, it’s your social network that gets you through the day.

Do you know someone whose job gives them leading indicators on the spread of the virus?  Early in the pandemic, friends of medical professionals got some advance warning of what was to come.  They could gather information from their professional spheres to let their personal networks know that something nasty was brewing.  The same holds for anyone whose business sells protective gear or cleaning supplies.  Over casual drinks, they might mention: “It’s weird … we’re getting a lot of new orders, and not from our usual customers.  Something’s up.  You may want to buy some extras, just in case.”  (This, by the by, shows the value of keeping an eye on your company’s data.  If you don’t have systems to tell you when your sales numbers are abnormal, you may miss information that you already had in-hand.  And in this case, it would have been time-sensitive info.)

Communities of shared experience are home to these socially-strong yet professionally-diverse networks.  Family and close friendships top the list, with religious and ethnic ties running a close second.  (People who were part of the same wave of immigration from the same country often forge ties that are as strong as family.)  Neighbors and people who share a hobby are also in there, though to a lesser degree. Within these groups there’s always somebody who has a quick tip, someone who “knows a guy,” someone you can pull aside for a quick “Hey can I ask you about …”  Maybe your niece works at a big grocery chain, and she can tell you when the shipments of hand sanitizer arrive.  In December, this would have been a trivial mention.  Today, when goods are scarce, this is timely information and it can make a difference.

Personal networks often have the benefit of being geographically dispersed. Your best friend can ship you cleaning supplies, since they are plentiful in his part of the country.  Your extended family, which stretches from Paris to Singapore, can tell you how their cities are handling shelter-in-place rules.  Chatting with those far-flung aunts and uncles gives you several weeks’ advance notice on how your city’s rules may turn out.  That reduces your uncertainty, which makes it easier for you to prepare, which reduces your stress and decision fatigue.

Your ability to forge new relationships can compensate for a smaller social network.  If you don’t have a relative who works at Target, you can ask someone who works there, so long as you have the skill to spot whom to ask.  You have to be able to read people, to see who would be receptive to that question. And you also need to tell whether this would be a simple favor, or something that merits monetary compensation.  The value on that information just increased by a wide margin; shouldn’t the price follow?

Relationship-building also counts in the B2B setting.  Such was the case with grocery chain Trader Joe’s.  They’ve managed to avoid shortages during this pandemic, most notably in toilet paper.  When other stores seemed to run out, Trader Joe’s always magically had some in stock.  That’s because they were able to strike a deal with an unnamed hotel chain to buy supplies that were going unused due to dramatic cuts in travel. Granted, Trader Joe’s very business model—white-labeling manufacturers’ goods—smoothed this road.  But their ability to forge that relationship counted just as much as their ability to execute on selling the goods.

The Challenges to Come: Tracing the Chains

We can reduce our pandemic-driven stresses by reducing the uncertainty.  To do that, we can trace chains of knock-on effects to determine what changes are coming, and plan accordingly.   For example:  “many restaurants have closed up,” therefore, “there’s less waste from restaurants,” therefore, “there’s less food for rats,” therefore, “expect rats to get more bold.”  So be careful when taking out your trash.  “The pandemic has drastically cut air travel,” therefore “airlines will have less revenue,” therefore “airlines will furlough employees,” therefore “businesses those employees patronized—from in-airport restaurants to hotel shuttle services to their at-home economies—will suffer.”

Though we can trace just one chain of effects at a time, multiple paths spin out of every “what next?” and spread out like a spider-crack in a window.  They connect down the line to weave a fabric of impacts. Case in point: WSJ’s Scott McCartney points out that the sudden drop in air travel has upset airlines’ ability to set prices, since they take such a data-driven approach. People who work in the ML/AI field will tell you that this is not just an airline problem: a sudden shift will upend any predictive models built on past behaviors, regardless of industry.  That will affect other fields’ dynamic pricing, yes, but also fraud detection (your credit card reflects a lot of outlier purchases, times, and locations since February) and demand forecasts (a knock-on effect of our collective outlier purchases).  That, in turn, ties to inventory management, which is tied to supply chains, which involve all of the players in the shipping industry, which is tied to fuel consumption and vehicle maintenance…

As with any tightly-coupled, complex system, all of these connections work in our favor until they suddenly don’t.  Expect pandemic-related changes to cascade, revealing both endogenous and systemic problems that are related in unexpected ways.

One problem with tight coupling, Charles Perrow notes in Normal Accidents, is that materials only have one path to take through the system.  If a component in the middle breaks, everything backs up so the entire system is as good as broken.  You can repair or re-create the old paths (when possible) or create new connections between components.  In Covid-speak, that means our long-term solutions fall into “partially restoring and re-thinking the pre-pandemic life” and “creating new ways to handle the day-to-day when there’s a highly infectious disease running around.”  There are business opportunities in both camps.

Covid as a Forcing Function: New Opportunities To Handle Pandemic Life

We mostly assume the phrases “contactless” and “touch-free” refer to electronic payments.  Those are very much in demand right now, but the touch-free space now extends to the wider notion of strangers not interacting in-person, and not handling the same objects at the same time.  That opens the door to online learning, telemedicine, tele-anything.  If you can provide your service at a distance, you have a lot of new prospects.

Entertainment already had a firm footing in the online world thanks to video streaming services. The pandemic, and its dramatically reduced cinema attendance, has provided them even more leverage as some movies will have a shorter time on the big screen before they shift to online video.  (As a side note, there’s another chain of knock-on effects to explore: since studios have been known to time releases to coincide with certain seasons and to have a better shot at industry awards, how will that change when films head into living rooms that much sooner?)  Other groups, like Chicago’s Lyric Opera and New York’s Met Opera, are hosting performances online as their subscribers can no longer attend in-person.

Still, it’s become more difficult for performances that rely on people being in the same space. Stand-up comedians from Nimesh Patel to Dave Chappelle have recently been able to pull off outdoor gigs with live-but-socially-distanced audiences. Fire-spinning and belly-dance performer Dawn Xiana Moon, of Raks Inferno and Raks Geek, combined multiple streaming services to simulate all of her performers being “on-stage” at the same time. This required her to leverage her technology background, a skill set that is admittedly rare in the live-act world, and she’d still prefer a single platform that just works. By comparison, TV and movie studios have yet to explain how they will manage to film while keeping cast and crew socially distant. The bottom line is that companies that create tools to improve filming multiple, simultaneous, geographically-dispersed teams will have a lot of customers.  (And for movies, will we see an increase in animation to fill the gap?)

People are also demanding more of their home internet infrastructure to support increased school- and work-related loads.  (The authors know several people who have shelled out to their ISPs for greater bandwidth.)  That also means a greater load on mid-tier services like social media sites, videoconference services, and the aforementioned streaming video platforms.  If you sell networking hardware or provide network operations services to those companies, you will have no shortage of work.

If the pandemic continues long enough, we expect to see a deeper penetration of home broadband service, especially wireless broadband.  This is another touch-free offering, as it permits your provider to establish and troubleshoot internet connectivity issues without sending a technician into your home.  (As another knock-on effect, this means providers will be able to limit field technicians’ service radius to their towers and datacenters, which should let them cover more territory on the same number of staff.)

Traditional, multi-year lease commercial real estate was already experiencing disruption due to coworking spaces.  They’ll now both suffer as companies rethink their post-pandemic office needs.  Doubly so since some newly-remote workers are taking the opportunity to move out of state. (Yet another knock-on effect: without office workers, what will happen to the lunch spots and bars that lined the dense urban-business landscape?)  You also have retailers abandoning spaces since there are far too few customers to browse stores. Two types of consumers may pick up that inventory, though.  The first, in the short term, Amazon may convert some old mall spaces into distribution centers. Other businesses will undoubtedly find ways to repurpose empty urban office spaces at deep-discount prices.

Second, and in the longer term, we’ll accept that our homes are simply not large enough to be our Everything Place.  People who choose to remain in dense urban environments will want their apartments to be more like standalone houses, which means having space for in-unit washer/dryer, multiple bathrooms, and multiple rooms to serve as offices. Perhaps cities will divvy up old office buildings into large apartments to meet that need.  That’s admittedly more of a stretch, if for no other reason than the time scale involved for the construction effort and the zoning law changes.  For now, some percentage of urban residents will simply pack up for the suburbs, or even more rural areas out of state. Let’s face it: if the restaurant scene has dwindled and public transit feels like too much of a coronavirus risk, then urban living has lost a lot of its luster.

Wherever we choose to live, we’ll need more support for running our homes.  This will include ways to make the most of our limited space, such as smaller-scale workout equipment and compact storage, and increased support for DIY repairs, like video tutorials from manufacturers on how to service their products.  This is another stretch, but if the pandemic lasts long enough, manufacturers will modify their products to make them easier to service.  That, or cheaper to just throw away and replace when they encounter a problem.

The difficulties of schooling don’t end with table space and bandwidth needs.  There are also the socioemotional concerns such as college students learning how to live away from home, and how the K-12 set learns to socialize when they don’t interact in person.  Not to mention, who will do the teaching?  In March, when stay-at-home orders started to hit US cities, many parents suddenly had to balance their full-time jobs with being full-time teachers.  (Technology consultants Sarah Aslanifar and Bobby Norton jokingly refer to their new roles as, “working from home-school.”)  Businesses took a double hit as they had to scramble to find a way for people to work from home, and then those same people spent the next several weeks distracted during the workday.

Some parents have since formed social “pods” with neighbors whom they trust to perform compatible pandemic hygiene.  Some of those have evolved into educational pods, wherein parents spring for someone to teach their group of kids.  An article in MIT Tech Review mentions a price tag of $10,000 per student, per semester.  This isn’t accessible to everyone; but for high-earning parents, it’s a simple economic decision: the cost to outsource schooling is smaller than the amount of money they’ll earn when they can perform their day jobs at full capacity.

Higher education was already experiencing some disruption—boot camps and certificate programs on one end, and students questioning their post-college job prospects on the other—and the onset of the pandemic has increased the pressure.  This goes beyond the last few months of sorting out whether and how to open campuses for autumn 2020.  Parents and students alike also question the price tag of a fancy four-year college when students will be attending classes from their kitchen table.  (One SNL sketch framed the experience as “University of Phoenix Online, with worse tech support.”)  For the time being, colleges can busy themselves by shoring up courseware and videoconferencing platforms in order to set autumn 2020 classes in motion.  They’ll quickly need to sort out other near-term concerns (shoring up lost profits from empty student housing) as well as their future prospects (demonstrating their value compared to vocational programs, especially if the job market suffers over the long term).   If schools can’t sort this out on their own, they’ll likely pay someone to sort it out for them.  There’s also a business opportunity in providing a centralized, one-stop SaaS platform such that colleges won’t have to cobble together their own with a mix of one-off tools.

One silver lining of working from home is that your job prospects just opened up.  Covid-19 has forced a lot of companies to admit that the old “this work can’t be done from home” excuse doesn’t hold up.  Some of them are even starting to like it: they see how much money they were burning on an office for people who already knew they’d be more effective working from home.  Many of them will scratch that line item from next year’s budget.

This means we’ll see more remote hiring in the sectors that can support it.  That will establish a clear boundary between the companies that see the benefits (“we’re now able to hire across the country for these hard-to-fill roles”) and those that do not (“we’re only hiring people who live in this city, for when we go back to the office”).  Big tech-sector names like Google and Facebook have already announced plans to extend work-from-home support, while Twitter and Atlassian have flat-out said that their crews can work from home indefinitely.  In some fields, failing to provide a remote-work option may limit your talent pool.  It will be the equivalent of running an office space in the suburbs when most companies, and their prospective employees, exist in the dense urban center and have no desire to commute.

Bringing Back (Pieces of) “The Old Normal”

Just as we’ll pay for help adapting to the current state of things, we’ll also pay for some semblance of “the old normal.”

People generally like meeting up, whether one-on-one for a tea or in larger groups for a party.  We’re already using videoconferencing tools to hang out with friends and family, and to attend events.  But we’re adapting to the tools more than the other way around.  Right now services like Hangouts, Meet, and  Zoom are still very much designed for, well, video versions of office conference calls: one person speaks at a time, and you get a “Brady Bunch” grid view of attendees.  Expect the incumbent vendors as well as new upstarts to create tools that are better suited for [specific interaction]-over-video, like conferences, classroom teaching, or music lessons.

We’re really feeling this in online conferences.  While webinar tools fulfill the mission of letting a person deliver a talk to a large number of attendees, they don’t support other aspects of an in-person event.  Randomly bumping into people and “hallway track” sessions have forged long-term bonds between conference attendees, far more than the talks themselves.  This could serve as a driver for VR, as that will take us away from “attending events from our living room” to “being in our living room, but actually attending events in a dedicated space.”  There is a big difference.

Another reason people meet up is to play games.  Online games are nothing new, and they’ve even gained some mainstream street cred thanks to casual gaming.  Expect to see improved coordination, such that you can play with people of your choosing (a feature lacking in a number of iOS Game Center offerings).  People playing more video games may also lead to greater participation in esports leagues, and even taking business meetings over a gaming session.

In-person interaction is our most risky form of socializing at the moment, but it’s also the one people want the most. Goods and services that help us to (safely) meet face-to-face will not just help us on an emotional level, but they could play a key role in helping the economy get back on its feet.

We have masks and face shields, which are good for being in public.  What about protective overgarments, reminiscent of 1950s interpretations of outer-space wear?  We could wear them to protect our entire body in public transit or airplanes, and then shed them before entering a friend’s home.  There’s also the down-to-earth business of designing and installing plastic shields between restaurant tables.  Maybe someone will create transparent, oversized cabins that allow you and a few trusted friends to be “on the beach” but still be indoors and away from others.

Meeting in person also counts for office space.  In a work-from-home world, some teams still prefer the in-person experience.  What can we do to make it safer to be in the office, beyond standing several feet apart at all times?  An effective but low-tech offering could involve installing protective shields around conference tables (not unlike what we see in some restaurants) or modifying office layouts to discourage crowding.  The next step up would increase touch-free actions, such as choosing your elevator floor through a smartphone app.  Larger and higher-tech offerings would go deep into the guts of the building to install virus purifiers in building HVAC systems and the accompanying ductwork.

What Next?

Where do we go from here?  That depends how long we go without treatment or improved preventative measures.  One thing’s for sure: Covid-19 is a driver of change.  There is no more “normal” in terms of how we shop for groceries, attend events, or even lay out our homes.  It’s up to us to adapt to our present, even as that present continues to change, and that will influence how we decide what to buy and sell.

How much we change, as people, depends on how long the pandemic lasts.  It’s possible that it will carve deep grooves in our collective social memory, similar to the Great Depression, and its impact will influence how people behave long after the disease is a threat.

It also depends on how much we are willing to adapt.  That is a function of how soon we’re willing to let go of “normal,” which is really a euphemism for “the past.”  Especially since the past is heavily mythologized.

Categories: Technology

Four short links: 25 August 2020

O'Reilly Radar - Tue, 2020/08/25 - 03:55
  1. Wooden Turing Machine — (YouTube) Description of how it works. It implements three data elements and two states, sufficient for any calculation (discussed here)> Not an infinite tape, because … wood.
  2. Emotional Resiliency in Leadership Report 2020 — Very interesting report, based on survey and science of burnout. It is primarily written for the survey respondents, and anyone dealing with burn-out and resilience issues either in themselves, family members and employees. If you’re only interested in how to address burn-out skip to section seven.
  3. Three Things Digital Ethics Can Learn From Medical EthicsEthics committees have at least three roles to play. The first is education. […] The second role of ethics committees is policy formation and review. […] The third role of ethics committees is to provide ethical consultation. and [T]echnological decisions are not only about facts (for example, about what is more efficient), but also about the kind of life we want and the kind of society we strive to build.
  4. Ent: An Entity Framework for GoSimple, yet powerful ORM for modeling and querying data. (a) Schema As Code – model any database schema as Go objects. (b) Easily Traverse Any Graph – run queries, aggregations and traverse any graph structure easily. (c) Statically Typed And Explicit API – 100% statically typed and explicit API using code generation. (d) Multi Storage Driver – supports MySQL, PostgreSQL, SQLite and Gremlin. Built by Facebookers, inspired by an FB-internal entity framework.
Categories: Technology

Four short links: 21 August 2020

O'Reilly Radar - Fri, 2020/08/21 - 04:56
  1. The 212 Story Tower That Isn’t in Suburban Melbourne — A typo in a Open Street Map submission becomes a surprising monolith in Microsoft Flight Simulator.
  2. Fairness in Machine Learning — Draft text for a book on the subject.
  3. The Social Architecture of Impactful Communities — A really good set of models for communities. Individuals typically “hire” communities to accomplish transitions that require human connection. The major sections: Why do people join communities?; Member quality determines community success; Design your community to spark quality interactions; The two levels of group cohesion; Recognizing and retaining key members; Growing your ranks; A Time to Build.
  4. Letters to a New Developer — A series of articles of advice to early-stage programmers, such as Don’t Try to Change the Tabbing/Bracing Style and On Debugging.
Categories: Technology

Four Short Links: 19 August 2020

O'Reilly Radar - Wed, 2020/08/19 - 04:44
  1. The Design Space of Computational Notebooks — Looked at 60 notebook systems and grouped 10 design space dimensions into four major stages of a data science workflow: importing data into notebooks (data sources), editing code and prose (editor style, supported programming languages, versioning, collaboration), running code to generate outputs (cell execution order, liveness [6], execution environment, and cell outputs), and publishing notebook outputs.
  2. Architecture Decision Records — The why’s and how’s of documentating architecture decisions. Future you will thank currently-present-you-but-past-you-by-the-time-it-is-useful.
  3. PEmbroideran open library for computational embroidery with Processing.
  4. Neuromorphic Chips Take ShapeNeuromorphic chips are packed with artificial neurons and artificial synapses that mimic the activity spikes that occur within the human brain—and they handle all this processing on the chip. This results in smarter, far more energy-efficient computing systems. Outline of the what and why, with a few examples.
Categories: Technology

Why Best-of-Breed is a Better Choice than All-in-One Platforms for Data Science

O'Reilly Radar - Tue, 2020/08/18 - 04:30

So you need to redesign your company’s data infrastructure.

Do you buy a solution from a big integration company like IBM, Cloudera, or Amazon?  Do you engage many small startups, each focused on one part of the problem?  A little of both?  We see trends shifting towards focused best-of-breed platforms. That is, products that are laser-focused on one aspect of the data science and machine learning workflows, in contrast to all-in-one platforms that attempt to solve the entire space of data workflows.

This article, which examines this shift in more depth, is an opinionated result of countless conversations with data scientists about their needs in modern data science workflows.

The Two Cultures of Data Tooling

Today we see two different kinds of offerings in the marketplace:

  1. All-in-one platforms like Amazon Sagemaker, AzureML, Cloudera Data Science Workbench, and Databricks (which is now a unified analytics platform);
  2. Best of Breed products that are laser-focused on one aspect of the data science or the machine learning process like Snowflake, Confluent/Kafka, MongoDB/Atlas, Coiled/Dask and Plotly.1

Integrated all-in-one platforms assemble many tools together, and can therefore provide a full solution to common workflows. They’re reliable and steady, but they tend not to be exceptional at any part of that workflow and they tend to move slowly. For this reason, such platforms may be a good choice for companies that don’t have the culture or skills to assemble their own platform.

In contrast, best-of-breed products take a more craftsman approach: they do one thing well and move quickly (often they are the ones driving technological change). They usually meet the needs of end users more effectively, are cheaper, and easier to work with.  However some assembly is required because they need to be used alongside other products to create full solutions.  Best-of-breed products require a DIY spirit that may not be appropriate for slow-moving companies.

Which path is best? This is an open question, but we’re putting our money on best-of-breed products. We’ll share why in a moment, but first, we want to look at a historical perspective with what happened to data warehouses and data engineering platforms.

Lessons Learned from Data Warehouse and Data Engineering Platforms

Historically, companies bought Oracle, SAS, Teradata or other data all-in-one data warehousing solutions. These were rock solid at what they did–and “what they did” includes offering packages that are valuable to other parts of the company, such as accounting–but it was difficult for customers to adapt to new workloads over time.

Next came data engineering platforms like Cloudera, Hortonworks, and MapR, which broke open the Oracle/SAS hegemony with open source tooling. These provided a greater level of flexibility with Hadoop, Hive, and Spark.

However, while Cloudera, Hortonworks, and MapR worked well for a set of common data engineering workloads, they didn’t generalize well to workloads that didn’t fit the MapReduce paradigm, including deep learning and new natural language models. As companies moved to cloud, embraced interactive Python, integrated GPUs, or moved to a greater diversity of data science and machine learning use cases, these data engineering platforms weren’t ideal. Data scientists rejected these platforms and went back to working on their laptops where they had full control to play around and experiment with new libraries and hardware.

While data engineering platforms provided a great place for companies to start building data assets, their rigidity becomes especially challenging when companies embrace data science and machine learning, both of which are highly dynamic fields with heavy churn that require much more flexibility in order to stay relevant. An all-in-one platform makes it easy to get started, but can become a problem when your data science practice outgrows it.

So if data engineering platforms like Cloudera displaced data warehousing platforms like SAS/Oracle, what will displace Cloudera as we move into the data science/machine learning age?

Why we think Best-of-Breed will displace walled garden platforms

The worlds of data science and machine learning move at a much faster pace than data warehousing and much of data engineering.  All-in-one platforms are too large and rigid to keep up.  Additionally, the benefits of integration are less relevant today with technologies like Kubernetes.  Let’s dive into these reasons in more depth.

Data Science and Machine Learning Require Flexibility

“Data science” is an incredibly broad term that encompasses dozens of activities like ETL, machine learning, model management, and user interfaces, each of which have many rapidly evolving choices. Only part of a data scientist’s workflow is typically supported by even the most mature data science platforms. Any attempt to build a one-size-fits-all integrated platform would have to include such a wide range of features, and such a wide range of choices within each feature, that it would be extremely difficult to maintain and keep up to date.  What happens when you want to incorporate real-time data feeds? What happens when you want to start analyzing time series data?  Yes, the all-in-one platforms will have tools to meet these needs; but will they be the tools you want, or the tools you’d choose if you had the opportunity?

Consider user interfaces. Data scientists use many tools like Jupyter notebooks, IDEs, custom dashboards, text editors, and others throughout their day. Platforms offering only “Jupyter notebooks in the cloud” cover only a small fraction of what actual data scientists use in a given day. This leaves data scientists spending half of their time in the platform, half outside the platform, and a new third half migrating between the two environments.

Consider also the computational libraries that all-in-one platforms support, and the speed at which they go out of date quickly. Famously, Cloudera ran Spark 1.6 for years after Spark 2.0 was released–even though (and perhaps because) Spark 2.0 was released only 6 months after 1.6. It’s quite hard for a platform to stay on top of all of the rapid changes that are happening today. They’re too broad and numerous to keep up with.

Kubernetes and the cloud commoditize integration

While the variety of data science has made all-in-one platforms harder, at the same time advances in infrastructure have made integrating best-of-breed products easier.

Cloudera, Hortonworks, and MapR were necessary at the time because Hadoop, Hive, and Spark were notoriously difficult to set up and coordinate. Companies that lacked technical skills needed to buy an integrated solution.

But today things are different. Modern data technologies are simpler to set up and configure. Also, technologies like Kubernetes and the cloud help to commoditize configuration and reduce integration pains with many narrowly-scoped products. Kubernetes lowers the barrier to integrating new products, which allows modern companies to assimilate and retire best-of-breed products on an as-needed basis without a painful onboarding process. For example, Kubernetes helps data scientists deploy APIs that serve models (machine learning or otherwise), build machine learning workflow systems, and is an increasingly common substrate for web applications that allows data scientists to integrate OSS technologies, as reported here by Hamel Hussain, Staff Machine Learning Engineer at Github.

Kubernetes provides a common framework in which most deployment concerns can be specified programmatically.  This puts more control into the hands of library authors, rather than individual integrators.  As a result the work of integration is greatly reduced, often just specifying some configuration values and hitting deploy.  A good example here is the Zero to JupyterHub guide.  Anyone with modest computer skills can deploy JupyterHub on Kubernetes without knowing too much in about an hour.  Previously this would have taken a trained professional with pretty deep expertise several days.

Final Thoughts

We believe that companies that adopt a best-of-breed data platform will be more able to adapt to technology shifts that we know are coming. Rather than being tied into a monolithic data science platform on a multi-year time scale, they will be able to adopt, use, and swap out products as their needs change.  Best of breed platforms enable companies to evolve and respond to today’s rapidly changing environment.

The rise of the data analyst, data scientist, machine learning engineer and all the satellite roles that tie the decision function of organizations to data, along with increasing amounts of automation and machine intelligence, require tooling that meet these end users’ needs. These needs are rapidly evolving and tied to open source tooling that is also evolving rapidly. Our strong opinion (strongly held) is that best-of-breed platforms are better positioned to serve these rapidly evolving needs by building on these OSS tools than all-in-platforms. We look forward to finding out.

Footnote

1 Note that we’re discussing data platforms that are built on top of OSS technologies, rather than the OSS technologies themselves. This is not another Dask vs Spark post, but a piece weighing up the utility of two distinct types of modern data platforms.

Categories: Technology

Four short links: 14 August 2020

O'Reilly Radar - Fri, 2020/08/14 - 04:38
  1. SinterSinter uses the user-mode EndpointSecurity API to subscribe to and receive authorization callbacks from the macOS kernel, for a set of security-relevant event types. The current version of Sinter supports allowing/denying process executions; in future versions we intend to support other types of events such as file, socket, and kernel events. Inspired by Google Santa (Santa because it decides if executables are naughty or nice), but aiming to vet more than executables.
  2. Extracting Info from Invoices — Turns out this is a double-hard problem: hard to get the algorithm good, and hard to get training sets. Info extraction datasets are, well, full of information. And if the info you want to extract is financial, or personally-identifying, or otherwise sensitive, then there aren’t generally freely-available training sets. There is no training dataset for invoices.
  3. Why is Science Hard for People to Trust — An interesting set of ideas, but these sentences have been echoing around my head: We hate being wronged, and it makes us vengeful. On the other hand, we don’t necessarily love being “done right by,” and we don’t have a particular motivation that comes from it. There’s no “positive” version of revenge. I wonder how this changes social software design.
  4. What ARGs Can Teach Us About QAnon — (Adrian Hon) A very thoughtful comparison between ARGs and conspiracy theories. These are useful steps but will not stop QAnon from spreading in social media comments or private chat groups or unmoderated forums. It’s not something we can reasonably hope for, and I don’t think there’s any technological solution (e.g. browser extensions) either. The only way to stop people from mistaking speculation from fact is for them to want to stop.
Categories: Technology

The Least Liked Programming Languages

O'Reilly Radar - Tue, 2020/08/11 - 04:46

StackOverflow’s 2020 developer survey included a table showing the  “most loved, dreaded, and wanted languages.” Loved and wanted languages are, well, sort of boring. The programming languages we dread are a lot more interesting. As Tolstoy said, “All happy families are alike; each unhappy family is unhappy in its own way.”

So what are these unhappy, unloved languages, and why do programmers dread using them? Given the chance it’s, well, hard to resist jumping in with some theories, and perhaps even saying something impolitic.  Or defending some languages that are disliked for the wrong reasons.

More precisely, StackOverflow tabulated the “% of developers who are developing with the language or technology but have not expressed interest in continuing to do so.” That doesn’t sound quite as dire as “dreaded”; “not expressing an interest in working with a language again” is a rather vague indication of dread. There are lots of things I’ve done that I’d rather not do again, including writing troff macros that spawned shell scripts. But we won’t worry about that, will we?

The list of least liked languages is similar to the lists of the most widely used languages, as indicated by Redmonk, Tiobe and, for that matter, searches on O’Reilly Learning. That’s no surprise; Bjarne Stroustrup said that “there are only two kinds of languages: the ones people complain about and the ones nobody uses.” And that makes a lot of sense, at least in relation to this survey. If you’ve got millions of users, it’s not hard to get a lot of people to dislike you. So seeing perennials like C alongside relative newcomers like Java on the list of disliked languages isn’t surprising.

Kevlin Henney and I thought that the list of least liked languages also reflected the opinions of programmers who were working on large and legacy projects, as opposed to short programs. Dislike of a language may be “guilt by association”: dislike of a large, antiquated codebase with minimal documentation, and an architectural style in which every bug fixed breaks something else. Therefore, it’s not surprising to see languages that used to be widely used but have fallen from popularity on the list. And it’s also easy to fall in love with a quirky language that was perfect for one project, but that you’ll never see again.  (In my case, that’s Icon. Try it; you might like it. It’s not on anyone’s list.)

What’s most surprising is when a language is out of place: when it’s significantly more or less disliked than you expect. That’s what I’d like to think about. So, having disposed of the preliminaries, here are a few observations:

  • Java: Java has been the language people love to hate since its birth. I was at the USENIX session in which James Gosling first spoke about Java (way before 1.0), and people left the room talking about how horrible Java was–none of whom had actually used the language because it hadn’t been released yet. Java comes in to this survey at a mild #9. Given Java’s reputation, that 9 should have hearts all over it.

    If there’s one language on this list that’s associated with gigantic projects, it’s Java.  And there are a lot of things to dislike about it—though a lot of them have to do with bad habits that grew up around Java, rather than the language itself. If you find yourself abusing design patterns, step back and look at what you’re doing; making everything into a design pattern is a sign that you didn’t understand what patterns are really for. (If you need a refresher, go to Head First Design Patterns or the classic Gang of Four book.) If you start writing a FactoryFactoryFactory, stop and take a nice long walk. If you’re writing a ClassWithAReallyLongNameBecauseThatsHowWeDoIt, you don’t need to. Java doesn’t make you do that.  Descriptive names are good; ridiculously long names (along with ridiculously deep package hierarchies) are not. I’ve always tried to put one coherent thought on each line of code. You can’t do that when the names are half a line long. But that’s not Java’s fault, it’s an odd cultural quirk of Java programmers.

    Java is verbose, but that’s not necessarily a problem. As someone who is not a Java fan told me once, all those declarations at the start of a class are actually documentation, and that documentation is particularly important on big projects. Once you know what the data structures are, you can make a pretty good guess what the class does. I’ve found Java easier to read and understand than most other languages, in part because it’s so explicit—and most good programmers realize that they spend more time reading others’ code than writing their own.
  • Ruby: I was very surprised to see Ruby as the #7 on the list. Ruby more disliked than Java?  What’s going on? I’ve had some fun programming in Ruby; it is, for the most part, a “do what I meant, not what I said” language, and 15 years ago, that promise made lots of programmers fall in love.

    But if we think about Ruby in the context of large systems, it makes sense. It’s not hard to write code that is ambiguous, at least to the casual observer. It’s very easy to run afoul of a function or method that has been “monkeypatched” to have some non-standard behavior, and those modifications are rarely documented. Metaprogramming has been used brilliantly in frameworks like Rails, but I’ve always been bothered by the “And now the magic happens” aspect of many Ruby libraries. These aren’t features that are conducive to large projects.

    Many years ago, I heard someone at a Ruby or Rails conference say “There aren’t any large projects; everything in Ruby takes 90% fewer lines of code.” I’ve always thought LOC was a foolish metric. And even if you believe that Ruby takes 90% fewer lines of code (I don’t), 10% of a big number is still a big number, particularly if it’s your responsibility to digest that code, including things happening behind your back.  Ruby is a lot of fun, and I still use it for quick scripts (though I’ve largely switched to Python for those), but is it the language to choose for a big project? That might have me running in fear.
  • R: R weighs in at #10 on the “dreaded list.” I think that’s because of a misperception. R both is, and is not, a general-purpose programming language. Several statisticians have told me “You programmers don’t get it. R is a statistics workbench, not a programming language. It’s not a weird version of Python.” I’ve played around with R any number of times, but I finally “got it” (well, at least partly) when I read through the R tutorial in Vince Buffalo’s Bioinformatics Data Skills. Loops and if statements are relegated to a couple of pages at the end of the tutorial, not one of the first concepts you learn. Why? Because if you’re using R correctly, you don’t need them. It’s designed so you don’t have to use them.  If you come from a more traditional language, you may find yourself fighting against the language, not working with it. There are better approaches to conditional logic and iteration.

    It also helps to use the best tools and libraries available: RStudio is a really nice integrated development environment for R, and the Tidyverse is a great set of libraries for working with data. Ironically, that may even be part of the problem: with excellent graphics libraries and web frameworks, R suddenly looks a lot less like a specialized statistics workbench, and more like a general-purpose workhorse.

    Lots of programmers seem to be taking another look at R–perhaps to analyze COVID data?  R jumped from #20 on the Tiobe index to #8 in the July, 2020 report. That’s a huge change. Whatever the reason, R will be a much more pleasant environment if you work with it, rather than against it. It is very opinionated–and the opinions are of a statistician, rather than a programmer.
  • Python: Python is #23 on the list—extraordinarily low for a language that’s so widely used. Python is easy to like; I could love Python just for getting rid of curly braces. But aside from that, what’s to love? I’ve always said “don’t choose the language, choose the libraries,” and Python has great libraries, particularly for numerical work. Pandas, Numpy, Scipy, and scikit-learn are good reasons to love Python, all by themselves. Features like list comprehensions eliminate a lot of the bookkeeping associated with traditional control structures. Python is as good for tasks that are quick and dirty as it is for bigger projects. If I want to do something with a spreadsheet, I almost always hack at it with Python. (Me? Pivot tables?) And tools like Jupyter make it easy to record your experimentation while you’re in the process.

    On the “big project” end of the scale, Python is easy to read: no eyeburn from nested curly braces or parens, and fewer levels of nesting thanks to comprehensions, maps, and other features.  It has reasonable (though admittedly quirky) object-oriented features. I’ve gone back to some of my older loopy scripts, and frequently been able to write them without loops at all. If you want to put one coherent thought on a line, that’s the best of all possible worlds.

    An important slogan from “The Zen of Python” is “Explicit is better than implicit”; you’re very rarely left guessing about what someone else meant, or trying to decipher some unexpected bit of magic that “happened.” Python wins the award for the most popular language to inspire minimal dislike. It’s got a balanced set of features that make it ideal for small projects, and good for large ones.
  • JavaScript: And what shall we say about JavaScript, sixteenth on the list? I’ve got nothing. It’s a language that grew in a random and disordered way, and that programmers eventually learned could be powerful and productive, largely due to Doug Crockford’s classic JavaScript: The Good Parts. A language that’s as widely used as JavaScript, and that’s only 16th on the list of most dreaded languages, is certainly doing something right.  But I don’t have to like it.

There’s lots more that could be said.  There’s no surprise that VBA is #1 disliked language.  I’ll admit to complete ignorance on Objective C (#2), which I’ve never had any reason to play with. Although I’m a Perl-hater from way back, I’m surprised that Perl is so widely disliked (#3), but some wounds never heal. It will be interesting to see what happens after Perl 7 has been out for a few years. Assembly (#4) is an acquired taste (and isn’t a single language).  If you don’t learn to love it, you pretty much have to hate it. And if you don’t love it, you really shouldn’t be using it. You can almost always avoid assembly, but when you need to work directly with the hardware, there’s no alternative.  C and C++ (#5 and #8, respectively) give you a lot of rope to hang yourself, but get you as close enough to the hardware for almost any project, without the pain of assembly.  Are they fading into the past, or will they be with us forever?  My guess is the latter; there are too many projects that demand C’s performance and ubiquity. It’s the foundation for just about everything important in modern computing.

Speculating about languages and why they’re liked or disliked is fun.  It may or may not be useful.  Take it for what it’s worth.

Categories: Technology

Four short links: 11 Aug 2020

O'Reilly Radar - Tue, 2020/08/11 - 04:26
  1. ImmuDBlightweight, high-speed immutable database for systems and applications. Open Source and easy to integrate into any existing application. Latest version provides multitenancy.
  2. Smart Mask — (CNN) Japanese startup Donut Robotics […] created a smart mask — a high-tech upgrade to standard face coverings, designed to make communication and social distancing easier. In conjunction with an app, the C-Face Smart mask can transcribe dictation, amplify the wearer’s voice, and translate speech into eight different languages. Masks are the latest wearables.
  3. CyberCode Onlinea Cyber Punk inspired, Text Based MMORPG Browser Game where gameplay interfaces are ‘Stealthily’ mimicking the VSCode interface. VSCode has such huge mindshare, people are copying its interface for games.
  4. pysa — Facebook’s static analysis tool for finding security problems in Python code. It’ll find data flow problems: Pysa performs iterative rounds of analysis to build summaries to determine which functions return data from a source and which functions have parameters that eventually reach a sink. If Pysa finds that a source eventually connects to a sink, it reports an issue. SQL injections are where data from a web form eventually makes it to a SQL request.
Categories: Technology

August 13th Virtual Meeting

PLUG - Mon, 2020/08/10 - 22:45

We've got a couple of presentations for you to enjoy from the comfort of your own home.  Just go to https://lufthans.bigbluemeeting.com/b/plu-yuk-7xx at 7pm on Thursday, Aug 13th and enjoy.

der.hans: FLOSS and you: a user freedom investigation

Description:
User freedom addresses software licensing from the perspective of those using the software.

What are advantages and disadvantages of different licensing models in relation to user freedom?

How does licensing impact individuals, organizations and businesses as we use software?

How does software distribution ( packages, cloud, bundled in a product ) impact user freedom?

The presentation and ensuing conversation is about user freedom and the impact of the software we choose to use.

It's a consideration of the everyperson relationship with software licensing.

Attendees will consider how the following relate to them:

* software usage models
* the four freedoms of Free Software
* the open source development model
* strong and weak copyleft

About der.hans:
der.hans is a technology and entrepreneurial veteran.

He is chairman of the Phoenix Linux User Group (PLUG), Promotions and Outreach chair for SeaGL, BoF organizer for the Southern California Linux Expo (SCaLE) and founder of the Free Software Stammtisch. He presents regularly at large community-led conferences (SCaLE, SeaGL, LFNW, Tübix, OLF, TXLF) and many local groups.

Currently a Customer Data Engineer at Object Rocket. Public statements are not representative of $dayjob.

Mastodon - https://floss.social/@FLOX_advocate

Plume - https://fediverse.blog/~/LuftHans

Tom Perrine: Retro-computing in the cloud - or how to run 70's era UNIX and Multics in GCP

Description:
I'm going to talk about SIMH, software that can emulate dozens of historically interesting CPUs and demonstrate automation that lets you launch V6 UNIX and Multics in the Google Compute Platform.

About Tom:
Tom Perrine is a life-long system administrator. Open source has been part of his life since the 80's beginning with Emacs, and leading to 4BSD, Slackware and Centos.

He recently finished 17 years at Playstation, where he managed several IT teams who created the first online game servers, and ran IT infrastructure for the 14 internal game studios. His final assignments were global IT strategic planning and IT transformation programs.

Before Playstation, he was the first CSO of the San Diego Supercomputer Center, handling all operational security as well as funded research for NSA, FBI and others. Before SDSC, he was a contractor doing infosec research for the intelligence community related to security kernels and trusted computing.

He's given testimony to the US Congress on privacy, and presented multiple times at USENIX, DEFCON and other conferences. His hobbies include SCUBA diving, Toastmasters and whisky, but rarely on the same day.

Four short links: 7 Aug 2020

O'Reilly Radar - Fri, 2020/08/07 - 06:13
  1. Surprising Economics of Load-Balanced SystemsI have a system with c servers, each of which can only handle a single concurrent request, and has no internal queuing. The servers sit behind a load balancer, which contains an infinite queue. An unlimited number of clients offer c * 0.8 requests per second to the load balancer on average. In other words, we increase the offered load linearly with c to keep the per-server load constant. Once a request arrives at a server, it takes one second to process, on average. How does the client-observed mean request team vary with c?
  2. CrushCrush is an attempt to make a traditional command line shell that is also a modern programming language. It has the features one would expect from a modern programming language like a type system, closures and lexical scoping, but with a syntax geared toward both batch and interactive shell usage. I’m not convinced this is where programming belongs, but some of the examples are shell power-user dreams.
  3. Deep Graph Learning Knowledge EmbeddingDGL-KE is a high performance, easy-to-use, and scalable package for learning large-scale knowledge graph embeddings. The package is implemented on the top of Deep Graph Library (DGL) and developers can run DGL-KE on CPU machine, GPU machine, as well as clusters. Open source, from Microsoft. (via Amazon Science)
  4. Flume — Open source React library to create your own visual programming language (drag-and-drop function nodes with connectors between them).
Categories: Technology

Pages

Subscribe to LuftHans aggregator