You are here

Feed aggregator

2022 Cloud Salary Survey

O'Reilly Radar - Wed, 2022/06/22 - 04:21

Last year, our report on cloud adoption concluded that adoption was proceeding rapidly; almost all organizations are using cloud services. Those findings confirmed the results we got in 2020: everything was “up and to the right.” That’s probably still true—but saying “everything is still up and to the right” would be neither interesting nor informative. So rather than confirming the same results for a third year, we decided to do something different.

This year’s survey asked questions about compensation for “cloud professionals”: the software developers, operations staff, and others who build cloud-based applications, manage a cloud platform, and use cloud services. We limited the survey to residents of the United States because salaries from different countries aren’t directly comparable; in addition to fluctuating exchange rates, there are different norms for appropriate compensation. This survey ran from April 4 through April 15, 2022, and was publicized via email to recipients of our Infrastructure & Ops Newsletter whom we could identify as residing in the United States or whose location was unknown.

Executive Summary
  • Survey respondents earn an average salary of $182,000.
  • The average salary increase over the past year was 4.3%.
  • 20% of respondents reported changing employers in the past year.
  • 25% of respondents are planning to change employers because of compensation.
  • The average salary for women is 7% lower than the average salary for men.
  • 63% of respondents work remotely all the time; 94% work remotely at least one day a week.
  • Respondents who participated in 40 or more hours of training in the past year received higher salary increases.

Of the 1,408 responses we initially received, 468 were disqualified. Respondents were disqualified (and the survey terminated) if the respondent said they weren’t a US resident or if they were under 18 years old; respondents were also disqualified if they said they weren’t involved with their organization’s use of cloud services. Another 162 respondents filled out part of the survey but didn’t complete it; we chose to include only complete responses. That left us with 778 responses. Participants came from 43 states plus Washington, DC. As with our other surveys, the respondents were a relatively senior group: the average age was 47 years old, and while the largest number identified themselves as programmers (43%), 14% identified as executives and 33% as architects.

The Big Picture

Cloud professionals are well paid. That’s not a surprise in itself. We expected salaries (including bonuses) to be high, and they were. The cloud professionals who responded to our survey earn an average salary of $182,000; the most common salary range among respondents was $150,000 to $175,000 per year (16% of the total), as shown in Figure 1. The peak was fairly broad: 68% of the respondents earn between $100,000 and $225,000 per year. And there was a significant “long tail” in the compensation stratosphere: 7% of the respondents earn over $300,000 per year, and 2.4% over $400,000 per year.

Figure 1. Annual salary by percentage of respondents

We believe that job changes are part of what’s driving high salaries. After all, we’ve heard about talent shortages in almost every field, with many employers offering very high salaries to attract the staff they need. By staying with their current employer, an employee may get an annual salary increase of 4%. But if they change jobs, they might get a significantly higher offer—20% or more—plus a signing bonus.

20% of the respondents reported that they changed employers in the past year. That number isn’t high in and of itself, but it looks a lot higher when you add it to the 25% who are planning to leave jobs over compensation. (Another 20% of the respondents declined to answer this question.) It’s also indicative that 19% of the respondents received promotions. There was some overlap between those who received promotions and those who changed jobs (5% of the total said “yes” to both questions, or roughly one quarter of those who changed jobs). When you look at the number of respondents who left their employer, are planning to leave their employer, or got a promotion and a salary increase, it’s easy to see why salary budgets are under pressure. Right now, qualified candidates have the power in the job market, though with the stock market correction that began in March 2022 and significant layoffs from some large technology-sector companies, that may be changing.

These conclusions are borne out when you look at the salaries of those who were promoted, changed jobs, or intend to change jobs. A promotion roughly doubled respondents’ year-over-year salary increase. On the average, those who were promoted received a 7% raise; those who weren’t promoted received a 3.7% increase. The result was almost exactly the same for those who changed jobs: those who changed averaged a 6.8% salary increase, while those who remained averaged 3.7%. We also see a difference in the salaries of those who intend to leave because of compensation: their average salary is $171,000, as opposed to $188,000 for those who didn’t plan to leave. That’s a $17,000 difference, or roughly 10%.

Salaries by Gender

One goal of this survey was to determine whether women are being paid fairly. Last year’s salary survey for data and AI found a substantial difference between men’s and women’s salaries: women were paid 16% less than men. Would we see the same here?

The quick answer is “yes,” but the difference was smaller. Average salaries for women are 7% lower than for men ($172,000 as opposed to $185,000). But let’s take a step back before looking at salaries in more detail. We asked our respondents what pronouns they use. Only 8.5% said “she,” while 79% chose “he.” That’s still only 87% of the total. Where are the rest? 12% preferred not to say; this is a larger group than those who used “she.” 0.5% chose “other,” and 0.7% chose “they.” (That’s only four and six respondents, respectively.) Compared to results from our survey on the data/AI industry, the percentage of cloud professionals who self-identified as women appears to be much smaller (8.5%, as opposed to 14%). But there’s an important difference between the surveys: “I prefer not to answer” wasn’t an option for the Data/AI Salary Survey. We can’t do much with those responses. When we eyeballed the data for the “prefer not to say” group, we saw somewhat higher salaries than for women, but still significantly less (5% lower) than for men.

The difference between men’s and women’s salaries is smaller than we expected, given the results of last year’s Data/AI Salary Survey. But it’s still a real difference, and it begs the question: Is compensation improving for women? Talent shortages are driving compensation up in many segments of the software industry. Furthermore, the average reported salaries for both men and women in our survey are high. Again, is that a consequence of the talent shortage? Or is it an artifact of our sample, which appears to be somewhat older, and rich in executives? We can’t tell from a single year’s data, and the year-over-year comparison we made above is based on a different industry segment. But the evidence suggests that the salary gap is closing, and progress is being made. And that is indeed a good thing.

Salaries for respondents who answered “other” to the question about the pronouns they use are 31% lower than salaries for respondents who chose “he.” Likewise, salaries for respondents who chose “they” are 28% lower than men’s average salaries. However, both of these groups are extremely small, and in both groups, one or two individuals pulled the averages down. We could make the average salaries higher by calling these individuals “outliers” and removing their data; after all, outliers can have outsized effects on small groups. That’s a step we won’t take. Whatever the reason, the outliers are there; they’re part of the data. Professionals all across the spectrum have low-paying jobs—sometimes by choice, sometimes out of necessity. Why does there appear to be a concentration of them among people who don’t use “he” or “she” as their pronouns? The effect probably isn’t quite as strong as our data indicates, but we won’t try to explain our data away. It’s certainly indicative that the groups that use “they” or another pronoun than “he” or “she” showed a salary penalty. We have to conclude that respondents who use nonbinary pronouns earn lower salaries, but without more data, we don’t know why, nor do we know how much lower their salaries are or whether this difference would disappear with a larger sample.

To see more about the differences between men’s and women’s salaries, we looked at the men and women in each salary range. The overall shapes of the salary distributions are clear: a larger percentage of women earn salaries between $0 and $175,000, and (with two exceptions) a larger percentage of men earn salaries over $175,000. However, a slightly larger percentage of women earn supersize salaries ($400,000 or more), and a significantly larger percentage earn salaries between $225,000 and $250,000 (Figure 2).

Figure 2. Men’s and women’s salaries by percentage of respondents

We can get some additional information by looking at salary increases (Figure 3). On average, women’s salary increases were higher than men’s: $9,100 versus $8,100. That doesn’t look like a big difference, but it’s over 10%. We can read that as a sign that women’s salaries are certainly catching up. But the signals are mixed. Men’s salaries increased more than women’s in almost every segment, with two big exceptions: 12% of women received salary increases over $30,000, while only 8% of men did the same. Likewise, 17% of women received increases between $10,000 and $15,000, but only 9% of men did. These differences might well disappear with more data.

Figure 3. Salary increases for women and men by percentage of respondents

When we look at salary increases as a percentage of salary, we again see mixed results (Figure 4). Women’s salary increases were much larger than men’s in three bands: over $325,000 (with the exception of $375,000–$400,000, where there were no women respondents), $275,000–$300,000, and $150,000–$175,000. For those with very large salaries, women’s salary increases were much higher than men’s. Furthermore, the $150,000–$175,000 band had the largest number of women. While there was a lot of variability, salary increases are clearly an important factor driving women’s salaries toward parity with men’s.

Figure 4. Salary increases as a percentage of salary The Effect of Education

The difference between men’s and women’s salaries is significant at almost every educational level (Figure 5). The difference is particularly high for respondents who are self-taught, where women earned 39% less ($112,000 versus $184,000), and for students (45% less, $87,000 versus $158,000). However, those were relatively small groups, with only two women in each group. It’s more important that for respondents with bachelor’s degrees, women’s salaries were 4% higher than men’s ($184,000 versus $176,000)—and this was the largest group in our survey. For respondents with advanced degrees, women with doctorates averaged a 15% lower salary than men with equivalent education; women with master’s degrees averaged 10% lower. The difference between women’s and men’s salaries appears to be greatest at the extremes of the educational spectrum.

Figure 5. Men’s and women’s salaries by degree Salaries by State

Participants in the survey come from 43 states plus Washington, DC. Looking at salaries by state creates some interesting puzzles. The highest salaries are found in Oklahoma; South Dakota is third, following California. And the top of the list is an interesting mix of states where we expected high salaries (like New York) and states where we expected salaries to be lower. So what’s happening?

The average salary from Oklahoma is $225,000—but that only reflects two respondents, both of whom work remotely 100% of the time. (We’ll discuss remote work later in this report.) Do they work for a Silicon Valley company and get a Silicon Valley salary? We don’t know, but that’s certainly a possibility. The average salary for South Dakota is $212,000, but we shouldn’t call it an “average,” because we only had one response, and this respondent reported working remotely 1–4 days per week. Likewise, Vermont had a single respondent, who works remotely and who also had an above-average salary. Many other states have high average salaries but a very small number of respondents.

So the first conclusion that we can draw is that remote work might be making it possible for people in states without big technology industries to get high salaries. Or it could be the opposite: there’s no state without some businesses using the cloud, and the possibility of remote work puts employers in those states in direct competition with Silicon Valley salaries: they need to pay much higher salaries to get the expertise they need. And those job offers may include the opportunity to work remotely full or part time—even if the employer is local. Both of those possibilities no doubt hold true for individuals, if not for geographical regions as a whole.

Outliers aside, salaries are highest in California ($214,000), New York ($212,000), Washington ($203,000), Virginia ($195,000), and Illinois ($191,000). Massachusetts comes next at $189,000. At $183,000, average salaries in Texas are lower than we’d expect, but they’re still slightly above the national average ($182,000). States with high average salaries tended to have the largest numbers of respondents—with the important exceptions that we’ve already noted. The lowest salaries are found in West Virginia ($87,000) and New Mexico ($84,000), but these reflected a small number of respondents (one and four, respectively). These two states aside, the average salary in every state was over $120,000 (Figure 6).

So, is remote work equalizing salaries between different geographical regions? It’s still too early to say. We don’t think there will be a mass exodus from high-salary states to more rural states, but it’s clear that professionals who want to make that transition can, and that companies that aren’t in high-salary regions will need to offer salaries that compete in the nationwide market. Future surveys will tell us whether this pattern holds true.

Figure 6. Average salary by state Salaries by Age

The largest group of respondents to our survey were between 45 and 54 years old (Figure 7). This group also had the highest average salary ($196,000). Salaries for respondents between 55 and 65 years old were lower (averaging $173,000), and salaries dropped even more for respondents over 65 ($139,000). Salaries for the 18- to 24-year-old age range were low, averaging $87,000. These lower salaries are no surprise because this group includes both students and those starting their first jobs after college.

It’s worth noting that our respondents were older than we expected; 29% were between 35 and 44 years old, 36% were between 45 and 54, and 22% were between 55 and 64. Data from our learning platform shows that this distribution isn’t indicative of the field as a whole, or of our audience. It may be an artifact of the survey itself. Are our newsletter readers older, or are older people more likely to respond to surveys? We don’t know.

Figure 7. Average salary by age

The drop in salaries after age 55 is surprising. Does seniority count for little? It’s easy to make hypotheses: Senior employees are less likely to change jobs, and we’ve seen that changing jobs drives higher salaries. But it’s also worth noting that AWS launched in 2002, roughly 20 years ago. People who are now 45 to 54 years old started their careers in the first years of Amazon’s rollout. They “grew up” with the cloud; they’re the real cloud natives, and that appears to be worth something in today’s market.

Job Titles and Roles

Job titles are problematic. There’s no standardized naming system, so a programming lead at one company might be an architect or even a CTO at another. So we ask about job titles at a fairly high level of abstraction. We offered respondents a choice of four “general” roles: executive, director, manager, or associate. We also allowed respondents to write in their own job titles; roughly half chose this option. The write-in titles were more descriptive and, as expected, inconsistent. We were able to group them into some significant clusters by looking for people whose write-in title used the words “engineer,” “programmer,” “developer,” “architect,” “consultant,” or “DevOps.” We also looked at two modifiers: “senior” and “lead.” There’s certainly room for overlap: someone could be a “senior DevOps engineer.” But in practice, overlap was small. (For example, no respondents used both “developer” and “architect” in a write-in job title.) There was no overlap between the titles submitted by respondents and the general titles we offered on the survey: our respondents had to choose one or the other.

So what did we see? As shown in Figure 8, the highest salaries go to those who classified themselves as directors ($235,000) or executives ($231,000). Salaries for architects, “leads,” and managers are on the next tier ($196,000, $190,000, and $188,000, respectively). People who identified as engineers earn slightly lower salaries ($175,000). Associates, a relatively junior category, earn an average of $140,000 per year. Those who used “programmer” in their job title are a puzzle. There were only three of them, which is a surprise in itself, and all have salaries in the $50,000 to $100,000 range (average $86,000). Consultants also did somewhat poorly, with an average salary of $129,000.

Those who identified as engineers (19%) made up the largest group of respondents, followed by associates (18%). Directors and managers each comprised 15% of the respondents. That might be a bias in our survey, since it’s difficult to believe that 30% of cloud professionals have directorial or managerial roles. (That fits the observation that our survey results may skew toward older participants.) Architects were less common (7%). And relatively few respondents identified themselves with the terms “DevOps” (2%), “consultant” (2%), or “developer” (2%). The small number of people who identify with DevOps is another puzzle. It’s often been claimed that the cloud makes operations teams unnecessary; “NoOps” shows up in discussions from time to time. But we’ve never believed that. Cloud deployments still have a significant operational component. While the cloud may allow a smaller group to oversee a huge number of virtual machines, managing those machines has become more complex—particularly with cloud orchestration tools like Kubernetes.

Figure 8. Average salary by job title

We also tried to understand what respondents are doing at work by asking about job roles, decoupling responsibilities from titles (Figure 9). So in another question, we asked respondents to choose between marketing, sales, product, executive, programmer, and architect roles, with no write-in option. Executives earn the highest salaries ($237,000) but were a relatively small group (14%). Architects are paid $188,000 per year on average; they were 33% of respondents. And for this question, respondents didn’t hesitate to identify as programmers: this group was the largest (43%), with salaries somewhat lower than architects ($163,000). This is roughly in agreement with the data we got from job titles. (And we should have asked about operations staff. Next year, perhaps.)

The remaining three groups—marketing, sales, and product—are relatively small. Only five respondents identified their role as marketing (0.6%), but they were paid well ($187,000). 1.5% of the respondents identified as sales, with an average salary of $186,000. And 8% of the respondents identified themselves with product, with a somewhat lower average salary of $162,000.

Figure 9. Average salary by role Working from Home

When we were planning this survey, we were very curious about where people worked. Many companies have moved to a fully remote work model (as O’Reilly has), and many more are taking a hybrid approach. But just how common is remote work? And what consequences does it have for the employees who work from home rather than in an office?

It turns out that remote work is surprisingly widespread (Figure 10). We found that only 6% of respondents answered no to the question “Do you work remotely?” More than half (63%) said that they work remotely all the time, and the remainder (31%) work remotely 1–4 days per week.

Working remotely is also associated with higher salaries: the average salary for people who work remotely 1–4 days a week is $188,000. It’s only slightly less ($184,000) for people who work remotely all the time. Salaries are sharply lower for people who never work remotely (average $131,000).

Figure 10. Salaries and remote work

Salary increases show roughly the same pattern (Figure 11). While salaries are slightly higher for respondents who occasionally work in the office, salary increases were higher for those who are completely remote: the average increase was $8,400 for those who are remote 100% of the time, while those who work from home 1–4 days per week only averaged a $7,800 salary increase. We suspect that given time, these two groups would balance out. Salary changes for those who never work remotely were sharply lower ($4,500).

Of all jobs in the computing industry, cloud computing is probably the most amenable to remote work. After all, you’re working with systems that are remote by definition. You’re not reliant on your own company’s data center. If the application crashes in the middle of the night, nobody will be rushing to the machine room to reboot the server. A laptop and a network connection are all you need.

Figure 11. Salary increases and remote work

We’re puzzled by the relatively low salaries and salary increases for those who never work remotely. While there were minor differences, as you’d expect, there were no “smoking guns”: no substantial differences in education or job titles or roles. Does this difference reflect old-school companies that don’t trust their staff to be productive at home? And do they pay correspondingly lower salaries? If so, they’d better be forewarned: it’s very easy for employees to change jobs in the current labor market.

As the pandemic wanes (if indeed it wanes—despite what people think, that’s not what the data shows), will companies stick with remote work or will they require employees to come back to the office? Some companies have already asked their employees to return. But we believe that the trend toward remote work will be hard, if not impossible, to reverse, especially in a job market where employers are competing for talent. Remote work certainly raises issues about onboarding new hires, training, group dynamics, and more. And it’s not without problems for the employees themselves: childcare, creating appropriate work spaces, etc. These challenges notwithstanding, it’s difficult to imagine people who have eliminated a lengthy commute from their lives going back to the office on a permanent basis.

Certifications and Training

Nearly half (48%) of our respondents participated in technical training or certification programs in the last year. 18% of them obtained one or more certifications, suggesting that 30% participated in training or some other form of professional development that wasn’t tied to a certification program.

The most common reasons for participating in training were learning new technologies (42%) and improving existing skills (40%). (Percentages are relative to the total number of respondents, which was 778.) 21% wanted to work on more interesting projects. The other possible responses were chosen less frequently: 9% of respondents wanted to move into a leadership role, and 12% were required to take training. Job security was an issue for 4% of the respondents, a very small minority. That’s consistent with our observation that employees have the upper hand in the labor market and are more concerned with advancement than with protecting their status quo.

Survey participants obtained a very broad range of certifications. We asked specifically about 11 cloud certifications that we identified as being particularly important. Most were specific to one of the three major cloud vendors: Microsoft Azure, Amazon Web Services, and Google Cloud. However, the number of people who obtained any specific certification was relatively small. The most popular certifications were AWS Certified Cloud Practitioner and Solutions Architect (both 4% of the total number of respondents). However, 8% of respondents answered “other” and provided a write-in answer. That’s 60 respondents—and we got 55 different write-ins. Obviously, there was very little duplication. The only submissions with multiple responses were CKA (Certified Kubernetes Administrator) and CKAD (Certified Kubernetes Application Developer). The range of training in this “other” group was extremely broad, spanning various forms of Agile training, security, machine learning, and beyond. Respondents were pursuing many vendor-specific certifications, and even academic degrees. (It’s worth noting that our 2021 Data/AI Salary Surveyreport also concluded that earning a certification for one of the major cloud providers was a useful tool for career advancement.)

Given the number of certifications that are available, this isn’t surprising. It’s somewhat more surprising that there isn’t any consensus on which certifications are most important. When we look at salaries, though, we see some signals…at least among the leading certifications. The largest salaries are associated with Google Cloud Certified Professional Cloud Architect ($231,000). People who earned this certification also received a substantial salary increase (7.1%). Those who obtained an AWS Certified Solutions Architect – Professional, AWS Certified Solutions Architect – Associate, or Microsoft Certified: Azure Solutions Architect Expert certification also earn very high salaries ($212,000, $201,000, and $202,000, respectively), although these three received smaller salary increases (4.6%, 4.4%, and 4.0%, respectively). Those who earned the CompTIA Cloud+ certification receive the lowest salary ($132,000) and got a relatively small salary increase (3.5%). The highest salary increase went to those who obtained the Google Cloud Certified Professional Cloud DevOps Engineer certification (9.7%), with salaries in the middle of the range ($175,000).

We can’t draw any conclusions about the salaries or salary increases corresponding to the many certifications listed among the “other” responses; most of those certifications only appeared once. But it seems clear that the largest salaries and salary increases go to those who are certified for one of the big three platforms: Google Cloud, AWS, and Microsoft Azure (Figures 12 and 13).

The salaries and salary increases for the two Google certifications are particularly impressive. Given that Google Cloud is the least widely used of the major platforms, and that the number of respondents for these certifications was relatively small, we suspect that talent proficient with Google’s tools and services is harder to find and drives the salaries up.

Figure 12. Average salary by certification Figure 13. Average salary increase by certification

Our survey respondents engaged in many different types of training. The most popular were watching videos and webinars (41%), reading books (39%), and reading blogs and industry articles (34%). 30% of the respondents took classes online. Given the pandemic, it isn’t at all surprising that only 1.7% took classes in person. 23% attended conferences, either online or in person. (We suspect that the majority attended online.) And 24% participated in company-offered training.

There’s surprisingly little difference between the average salaries associated with each type of learning. That’s partly because respondents were allowed to choose more than one response. But it’s also notable that the average salaries for most types of learning are lower than the average salary for the respondents as a whole. The average salary by type of learning ranges from $167,000 (in-person classes) to $184,000 (company-provided educational programs). These salaries are on the low side compared to the overall average of $182,000. Lower salaries may indicate that training is most attractive to people who want to get ahead in their field. This fits the observation that most of the people who participated in training did so to obtain new skills or to improve current ones. After all, to many companies “the cloud” is still relatively new, and they need to retrain their current workforces.

When we look at the time that respondents spent in training (Figure 14), we see that the largest group spent 20–39 hours in the past year (13% of all the respondents). 12% spent 40–59 hours; and 10% spent over 100 hours. No respondents reported spending 10–19 hours in training. (There were also relatively few in the 80–99 hour group, but we suspect that’s an artifact of “bucketing”: if you’ve taken 83 hours of training, you’re likely to think, “I don’t know how much time I spent in training, but it was a lot,” and choose 100+.) The largest salary increases went to those who spent 40–59 hours in training, followed by those who spent over 100 hours; the smallest salary increases, and the lowest salaries, went to those who only spent 1–9 hours in training. Managers take training into account when planning compensation, and those who skimp on training shortchange themselves.

Figure 14. Percentage salary increase by time spent in training The Cloud Providers

A survey of this type wouldn’t be complete without talking about the major cloud providers. There’s no really big news here (Figure 15). Amazon Web Services has the most users, at 72%, followed by Microsoft Azure (42%) and Google Cloud (31%). Compared to the cloud survey we did last year, it looks like Google Cloud and Azure have dropped slightly compared to AWS. But the changes aren’t large. Oracle’s cloud offering was surprisingly strong at 6%, and 4% of the respondents use IBM Cloud.

When we look at the biggest cloud providers that aren’t based in the US, we find that they’re still a relatively small component of cloud usage: 0.6% of respondents use Alibaba, while 0.3% use Tencent. Because there are so few users among our respondents, the percentages don’t mean much: a few more users, and we might see something completely different. That said, we expected to see more users working with Alibaba; it’s possible that tensions between the United States and China have made it a less attractive option.

20% of the respondents reported using a private cloud. While it’s not entirely clear what the term “private cloud” means—for some, it just means a traditional data center—almost all the private cloud users also reported using one of the major cloud providers. This isn’t surprising; private clouds make the most sense as part of a hybrid or multicloud strategy, where the private cloud holds data that must be kept on premises for security or compliance reasons.

6% of the respondents reported using a cloud provider that we didn’t list. These answers were almost entirely from minor cloud providers, which had only one or two users among the survey participants. And surprisingly, 4% of the respondents reported that they weren’t using any cloud provider.

Figure 15. Cloud provider usage by percentage of respondents

There’s little difference between the salaries reported by people using the major providers (Figure 16). Tencent stands out; the average salary for its users is $275,000. But there were so few Tencent users among the survey respondents that we don’t believe this average is meaningful. There appears to be a slight salary premium for users of Oracle ($206,000) and Google ($199,000); since these cloud providers aren’t as widely used, it’s easy to assume that organizations committed to them are willing to pay slightly more for specialized talent, a phenomenon we’ve observed elsewhere. Almost as a footnote, we see that the respondents who don’t use a cloud have significantly lower salaries ($142,000).

Figure 16. Average salary by cloud provider

Cloud providers offer many services, but their basic services fall into a few well-defined classes (Figure 17). 75% of the survey respondents reported using virtual instances (for example, AWS EC2), and 74% use bucket storage (for example, AWS S3). These are services that are offered by every cloud provider. Most respondents use an SQL database (59%). Somewhat smaller numbers reported using a NoSQL database (41%), often in conjunction with an SQL database. 49% use container orchestration services; 45% use “serverless,” which suggests that serverless is more popular than we’ve seen in our other recent surveys.

Only 11% reported using some kind of AutoML—again, a service that’s provided by all the major cloud providers, though under differing names. And again, we saw no significant differences in salary based on what services were in use. That makes perfect sense; you wouldn’t pay a carpenter more for using a hammer than for using a saw.

Figure 17. Basic cloud services usage by percentage of respondents The Work Environment

Salaries aside, what are cloud developers working with? What programming languages and tools are they using?


Python is the most widely used language (59% of respondents), followed by SQL (49%), JavaScript (45%), and Java (32%). It’s somewhat surprising that only a third of the respondents use Java, given that programming language surveys done by TIOBE and RedMonk almost always have Java, Python, and JavaScript in a near tie for first place. Java appears not to have adapted well to the cloud (Figure 18).

Salaries also follow a pattern that we’ve seen before. Although the top four languages are in high demand, they don’t command particularly high salaries: $187,000 for Python, $179,000 for SQL, $181,000 for JavaScript, and $188,000 for Java (Figure 19). These are all “table stakes” languages: they’re necessary and they’re what most programmers use on the job, but the programmers who use them don’t stand out. And despite the necessity, there’s a lot of talent available to fill these roles. As we saw in last year’s Data/AI Salary Survey report, expertise in Scala, Rust, or Go commands a higher salary ($211,000, $202,000, and $210,000, respectively). While the demand for these languages isn’t as high, there’s a lot less available expertise. Furthermore, fluency in any of these languages shows that a programmer has gone considerably beyond basic competence. They’ve done the work necessary to pick up additional skills.

Figure 18. Programming language usage by percentage of respondents

The lowest salaries were reported by respondents using PHP ($155,000). Salaries for C, C++, and C# are also surprisingly low ($170,000, $172,000, and $170,000, respectively); given the importance of C and C++ for software development in general and the importance of C# for the Microsoft world, we find it hard to understand why.

Almost all of the respondents use multiple languages. If we had to make a recommendation for someone who wanted to move into cloud development or operations, or for someone planning a cloud strategy from scratch, it would be simple: focus on SQL plus one of the other table stakes languages (Java, JavaScript, or Python). If you want to go further, pick one of the languages associated with the highest salaries. We think Scala is past its peak, but because of its strong connection to the Java ecosystem, Scala makes sense for Java programmers. For Pythonistas, we’d recommend choosing Go or Rust.

Figure 19. Average salary by programming language Operating Systems

We asked our survey participants which operating systems they used so we could test something we’ve heard from several people who hire software developers: Linux is a must. That appears to be the case: 80% of respondents use Linux (Figure 20). Even though Linux really hasn’t succeeded in the desktop market (sorry), it’s clearly the operating system for most software that runs in the cloud. If Linux isn’t a requirement, it’s awfully close.

67% of the respondents reported using macOS, but we suspect that’s mostly as a desktop or laptop operating system. Of the major providers, only AWS offers macOS virtual instances, and they’re not widely used. (Apple’s license only allows macOS to run on Apple hardware, and only AWS provides Apple servers.) 57% of the respondents reported using some version of Windows. While we suspect that Windows is also used primarily as a desktop or laptop operating system, Windows virtual instances are available from all the major providers, including Oracle and IBM.

Figure 20. Operating system usage by percentage of respondents Tools

We saw little variation in salary from tool to tool. This lack of variation makes sense. As we said above, we don’t expect a carpenter who uses a hammer to be paid more than a carpenter who uses a saw. To be a competent carpenter, you need to use both, along with levels, squares, and a host of other tools.

However, it is interesting to know what tools are commonly in use (Figure 21). There aren’t any real surprises. Docker is almost universal, used by 76% of the respondents. Kubernetes use is very widespread, by 61% of the respondents. Other components of the Kubernetes ecosystem didn’t fare as well: 27% of respondents reported using Helm, and 12% reported using Istio, which has been widely criticized for being too complex.

Alternatives to this core cluster of tools don’t appear to have much traction. 10% of the respondents reported using OpenShift, the IBM/Red Hat package that includes Kubernetes and other core components. Our respondents seem to prefer building their tooling environment themselves. Podman, an alternative to Docker and a component of OpenShift, is only used by 8% of the respondents. Unfortunately, we didn’t ask about Linkerd, which appears to be establishing itself as a service mesh that’s simpler to configure than Istio. However, it didn’t show up among the write-in responses, and the number of respondents who said “other” was relatively small (9%).

The HashiCorp tool set (Terraform, Consul, and Vault) appears to be more widely used: 41% of the respondents reported using Terraform, 17% use Vault, and 8% use Consul. However, don’t view these as alternatives to Kubernetes. Terraform is a tool for building and configuring cloud infrastructure, and Vault is a secure repository for secrets. Only Consul competes directly.

Figure 21. Tool usage by percentage of respondents The Biggest Impact

Finally, we asked the respondents what would have the biggest impact on compensation and promotion. The least common answer was “data tools” (6%). This segment of our audience clearly isn’t working directly with data science or AI—though we’d argue that might change as more machine learning applications reach production. “Programming languages” was second from the bottom. The lack of concern about programming languages reflects reality. While we observed higher salaries for respondents who used Scala, Rust, or Go, if you’re solidly grounded in the basics (like Python and SQL), you’re in good shape. There’s limited value in pursuing additional languages once you have the table stakes.

The largest number of respondents said that knowledge of “cloud and containers” would have the largest effect on compensation. Again, containers are table stakes, as we saw in the previous section. Automation, security, and machine learning were also highly rated (18%, 15%, and 16%, respectively). It’s not clear why machine learning was ranked highly but data tools wasn’t. Perhaps our respondents interpreted “data tools” as software like Excel, R, and pandas.

11% of the respondents wrote in an answer. As usual with write-ins, the submissions were scattered, and mostly singletons. However, many of the write-in answers pointed toward leadership and management skills. Taken all together, these varied responses add up to about 2% of the total respondents. Not a large number, but still a signal that some part of our audience is thinking seriously about IT leadership.

Confidence in the Future

“Cloud adoption is up and to the right”? No, we already told you we weren’t going to conclude that. Though it’s no doubt true; we don’t see cloud adoption slowing in the near future.

Salaries are high. That’s good for employees and difficult for employers. It’s common for staff to jump to another employer offering a higher salary and a generous signing bonus. The current stock market correction may put a damper on that trend. There are signs that Silicon Valley’s money supply is starting to dry up, in part because of higher interest rates but also because investors are nervous about how the online economy will respond to regulation, and impatient with startups whose business plan is to lose billions “buying” a market before they figure out how to make money. Higher interest rates and nervous investors could mean an end to skyrocketing salaries.

The gap between women’s and men’s salaries has narrowed, but it hasn’t closed. While we don’t have a direct comparison for the previous year, last year’s Data/AI Salary Surveyreport showed a 16% gap. In this survey, the gap has been cut to 7%, and women are receiving salary increases that are likely to close that gap even further. It’s anyone’s guess how this will play out in the future. Talent is in short supply, and that puts upward pressure on salaries. Next year, will we see women’s salaries on par with men’s? Or will the gap widen again when the talent shortage isn’t so acute?

While we aren’t surprised by the trend toward remote work, we are surprised at how widespread remote work has become: as we saw, only 10% of our survey respondents never work remotely, and almost two-thirds work remotely full time. Remote work may be easier for cloud professionals, because part of their job is inherently remote. However, after seeing these results, we’d predict similar numbers for other industry sectors. Remote work is here to stay.

Almost half of our survey respondents participated in some form of training in the past year. Training on the major cloud platforms (AWS, Azure, and Google Cloud) was associated with higher salaries. However, our participants also wrote in 55 “other” kinds of training and certifications, of which the most popular was CKA (Certified Kubernetes Administrator).

Let’s end by thinking a bit more about the most common answer to the question “What area do you feel will have the biggest impact on compensation and promotion in the next year?”: cloud and containers. Our first reaction is that this is a poorly phrased option; we should have just asked about containers. Perhaps that’s true, but there’s something deeper hidden in this answer. If you want to get ahead in cloud computing, learn more about the cloud. It’s tautological, but it also shows some real confidence in where the industry is heading. Cloud professionals may be looking for their next employer, but they aren’t looking to jump ship to the “next big thing.” Businesses aren’t jumping away from the cloud to “the next big thing” either; whether it’s AI, the “metaverse,” or something else, their next big thing will be built in the cloud. And containers are the building blocks of the cloud; they’re the foundation on which the future of cloud computing rests. Salaries are certainly “up and to the right,” and we don’t see demand for cloud-capable talent dropping any time in the near future.

Categories: Technology

“Sentience” is the Wrong Question

O'Reilly Radar - Tue, 2022/06/21 - 06:30

On June 6, Blake Lemoine, a Google engineer, was suspended by Google for disclosing a series of conversations he had with LaMDA, Google’s impressive large model, in violation of his NDA. Lemoine’s claim that LaMDA has achieved “sentience” was widely publicized–and criticized–by almost every AI expert. And it’s only two weeks after Nando deFreitas, tweeting about DeepMind’s new Gato model, claimed that artificial general intelligence is only a matter of scale. I’m with the experts; I think Lemoine was taken in by his own willingness to believe, and I believe DeFreitas is wrong about general intelligence. But I also think that “sentience” and “general intelligence” aren’t the questions we ought to be discussing.

The latest generation of models is good enough to convince some people that they are intelligent, and whether or not those people are deluding themselves is beside the point. What we should be talking about is what responsibility the researchers building those models have to the general public. I recognize Google’s right to require employees to sign an NDA; but when a technology has implications as potentially far-reaching as general intelligence, are they right to keep it under wraps?  Or, looking at the question from the other direction, will developing that technology in public breed misconceptions and panic where none is warranted?

Google is one of the three major actors driving AI forward, in addition to OpenAI and Facebook. These three have demonstrated different attitudes towards openness. Google communicates largely through academic papers and press releases; we see gaudy announcements of its accomplishments, but the number of people who can actually experiment with its models is extremely small. OpenAI is much the same, though it has also made it possible to test-drive models like GPT-2 and GPT-3, in addition to building new products on top of its APIs–GitHub Copilot is just one example. Facebook has open sourced its largest model, OPT-175B, along with several smaller pre-built models and a voluminous set of notes describing how OPT-175B was trained.

I want to look at these different versions of “openness” through the lens of the scientific method. (And I’m aware that this research really is a matter of engineering, not science.)  Very generally speaking, we ask three things of any new scientific advance:

  • It can reproduce past results. It’s not clear what this criterion means in this context; we don’t want an AI to reproduce the poems of Keats, for example. We would want a newer model to perform at least as well as an older model.
  • It can predict future phenomena. I interpret this as being able to produce new texts that are (as a minimum) convincing and readable. It’s clear that many AI models can accomplish this.
  • It is reproducible. Someone else can do the same experiment and get the same result. Cold fusion fails this test badly. What about large language models?

Because of their scale, large language models have a significant problem with reproducibility. You can download the source code for Facebook’s OPT-175B, but you won’t be able to train it yourself on any hardware you have access to. It’s too large even for universities and other research institutions. You still have to take Facebook’s word that it does what it says it does. 

This isn’t just a problem for AI. One of our authors from the 90s went from grad school to a professorship at Harvard, where he researched large-scale distributed computing. A few years after getting tenure, he left Harvard to join Google Research. Shortly after arriving at Google, he blogged that he was “working on problems that are orders of magnitude larger and more interesting than I can work on at any university.” That raises an important question: what can academic research mean when it can’t scale to the size of industrial processes? Who will have the ability to replicate research results on that scale? This isn’t just a problem for computer science; many recent experiments in high-energy physics require energies that can only be reached at the Large Hadron Collider (LHC). Do we trust results if there’s only one laboratory in the world where they can be reproduced?

That’s exactly the problem we have with large language models. OPT-175B can’t be reproduced at Harvard or MIT. It probably can’t even be reproduced by Google and OpenAI, even though they have sufficient computing resources. I would bet that OPT-175B is too closely tied to Facebook’s infrastructure (including custom hardware) to be reproduced on Google’s infrastructure. I would bet the same is true of LaMDA, GPT-3, and other very large models, if you take them out of the environment in which they were built.  If Google released the source code to LaMDA, Facebook would have trouble running it on its infrastructure. The same is true for GPT-3. 

So: what can “reproducibility” mean in a world where the infrastructure needed to reproduce important experiments can’t be reproduced?  The answer is to provide free access to outside researchers and early adopters, so they can ask their own questions and see the wide range of results. Because these models can only run on the infrastructure where they’re built, this access will have to be via public APIs.

There are lots of impressive examples of text produced by large language models. LaMDA’s are the best I’ve seen. But we also know that, for the most part, these examples are heavily cherry-picked. And there are many examples of failures, which are certainly also cherry-picked.  I’d argue that, if we want to build safe, usable systems, paying attention to the failures (cherry-picked or not) is more important than applauding the successes. Whether it’s sentient or not, we care more about a self-driving car crashing than about it navigating the streets of San Francisco safely at rush hour. That’s not just our (sentient) propensity for drama;  if you’re involved in the accident, one crash can ruin your day. If a natural language model has been trained not to produce racist output (and that’s still very much a research topic), its failures are more important than its successes. 

With that in mind, OpenAI has done well by allowing others to use GPT-3–initially, through a limited free trial program, and now, as a commercial product that customers access through APIs. While we may be legitimately concerned by GPT-3’s ability to generate pitches for conspiracy theories (or just plain marketing), at least we know those risks.  For all the useful output that GPT-3 creates (whether deceptive or not), we’ve also seen its errors. Nobody’s claiming that GPT-3 is sentient; we understand that its output is a function of its input, and that if you steer it in a certain direction, that’s the direction it takes. When GitHub Copilot (built from OpenAI Codex, which itself is built from GPT-3) was first released, I saw lots of speculation that it will cause programmers to lose their jobs. Now that we’ve seen Copilot, we understand that it’s a useful tool within its limitations, and discussions of job loss have dried up. 

Google hasn’t offered that kind of visibility for LaMDA. It’s irrelevant whether they’re concerned about intellectual property, liability for misuse, or inflaming public fear of AI. Without public experimentation with LaMDA, our attitudes towards its output–whether fearful or ecstatic–are based at least as much on fantasy as on reality. Whether or not we put appropriate safeguards in place, research done in the open, and the ability to play with (and even build products from) systems like GPT-3, have made us aware of the consequences of “deep fakes.” Those are realistic fears and concerns. With LaMDA, we can’t have realistic fears and concerns. We can only have imaginary ones–which are inevitably worse. In an area where reproducibility and experimentation are limited, allowing outsiders to experiment may be the best we can do. 

Categories: Technology

Topic for June 9th

PLUG - Thu, 2022/06/09 - 17:55

This is a remote meeting. Please join by going to at 7pm on Thursday June 9th.

Brian Peters: Virtual Data Optimizer (VDO) - Data Reduction for Block Storage

Introduction to Virtual Data Optimizer (VDO), an advanced storage technology for maximizing drive space. In this presentation we'll discuss use cases for VDO, advantages & disadvantages, and demo configuring & testing a drive using Virtual Data Optimizer.

About Brian:
Brian Peters, has been interested in technology since childhood. His first PC was a 486 clone that was upgraded many times over. His interest for Linux started with Ubuntu 5.10 (Breezy Badger), but has since found home with Debian. Brian is RHCSA certified and enjoys sharing his passion for FOSS with others.

Closer to AGI?

O'Reilly Radar - Tue, 2022/06/07 - 04:09

DeepMind’s new model, Gato, has sparked a debate on whether artificial general intelligence (AGI) is nearer–almost at hand–just a matter of scale.  Gato is a model that can solve multiple unrelated problems: it can play a large number of different games, label images, chat, operate a robot, and more.  Not so many years ago, one problem with AI was that AI systems were only good at one thing. After IBM’s Deep Blue defeated Garry Kasparov in chess,  it was easy to say “But the ability to play chess isn’t really what we mean by intelligence.” A model that plays chess can’t also play space wars. That’s obviously no longer true; we can now have models capable of doing many different things. 600 things, in fact, and future models will no doubt do more.

So, are we on the verge of artificial general intelligence, as Nando de Frietas (research director at DeepMind) claims? That the only problem left is scale? I don’t think so.  It seems inappropriate to be talking about AGI when we don’t really have a good definition of “intelligence.” If we had AGI, how would we know it? We have a lot of vague notions about the Turing test, but in the final analysis, Turing wasn’t offering a definition of machine intelligence; he was probing the question of what human intelligence means.

Consciousness and intelligence seem to require some sort of agency.  An AI can’t choose what it wants to learn, neither can it say “I don’t want to play Go, I’d rather play Chess.” Now that we have computers that can do both, can they “want” to play one game or the other? One reason we know our children (and, for that matter, our pets) are intelligent and not just automatons is that they’re capable of disobeying. A child can refuse to do homework; a dog can refuse to sit. And that refusal is as important to intelligence as the ability to solve differential equations, or to play chess. Indeed, the path towards artificial intelligence is as much about teaching us what intelligence isn’t (as Turing knew) as it is about building an AGI.

Even if we accept that Gato is a huge step on the path towards AGI, and that scaling is the only problem that’s left, it is more than a bit problematic to think that scaling is a problem that’s easily solved. We don’t know how much power it took to train Gato, but GPT-3 required about 1.3 Gigawatt-hours: roughly 1/1000th the energy it takes to run the Large Hadron Collider for a year. Granted, Gato is much smaller than GPT-3, though it doesn’t work as well; Gato’s performance is generally inferior to that of single-function models. And granted, a lot can be done to optimize training (and DeepMind has done a lot of work on models that require less energy). But Gato has just over 600 capabilities, focusing on natural language processing, image classification, and game playing. These are only a few of many tasks an AGI will need to perform. How many tasks would a machine be able to perform to qualify as a “general intelligence”? Thousands?  Millions? Can those tasks even be enumerated? At some point, the project of training an artificial general intelligence sounds like something from Douglas Adams’ novel The Hitchhiker’s Guide to the Galaxy, in which the Earth is a computer designed by an AI called Deep Thought to answer the question “What is the question to which 42 is the answer?”

Building bigger and bigger models in hope of somehow achieving general intelligence may be an interesting research project, but AI may already have achieved a level of performance that suggests specialized training on top of existing foundation models will reap far more short term benefits. A foundation model trained to recognize images can be trained further to be part of a self-driving car, or to create generative art. A foundation model like GPT-3 trained to understand and speak human language can be trained more deeply to write computer code.

Yann LeCun posted a Twitter thread about general intelligence (consolidated on Facebook) stating some “simple facts.” First, LeCun says that there is no such thing as “general intelligence.” LeCun also says that “human level AI” is a useful goal–acknowledging that human intelligence itself is something less than the type of general intelligence sought for AI. All humans are specialized to some extent. I’m human; I’m arguably intelligent; I can play Chess and Go, but not Xiangqi (often called Chinese Chess) or Golf. I could presumably learn to play other games, but I don’t have to learn them all. I can also play the piano, but not the violin. I can speak a few languages. Some humans can speak dozens, but none of them speak every language.

There’s an important point about expertise hidden in here: we expect our AGIs to be “experts” (to beat top-level Chess and Go players), but as a human, I’m only fair at chess and poor at Go. Does human intelligence require expertise? (Hint: re-read Turing’s original paper about the Imitation Game, and check the computer’s answers.) And if so, what kind of expertise? Humans are capable of broad but limited expertise in many areas, combined with deep expertise in a small number of areas. So this argument is really about terminology: could Gato be a step towards human-level intelligence (limited expertise for a large number of tasks), but not general intelligence?

LeCun agrees that we are missing some “fundamental concepts,” and we don’t yet know what those fundamental concepts are. In short, we can’t adequately define intelligence. More specifically, though, he mentions that “a few others believe that symbol-based manipulation is necessary.” That’s an allusion to the debate (sometimes on Twitter) between LeCun and Gary Marcus, who has argued many times that combining deep learning with symbolic reasoning is the only way for AI to progress. (In his response to the Gato announcement, Marcus labels this school of thought “Alt-intelligence.”) That’s an important point: impressive as models like GPT-3 and GLaM are, they make a lot of mistakes. Sometimes those are simple mistakes of fact, such as when GPT-3 wrote an article about the United Methodist Church that got a number of basic facts wrong. Sometimes, the mistakes reveal a horrifying (or hilarious, they’re often the same) lack of what we call “common sense.” Would you sell your children for refusing to do their homework? (To give GPT-3 credit, it points out that selling your children is illegal in most countries, and that there are better forms of discipline.)

It’s not clear, at least to me, that these problems can be solved by “scale.” How much more text would you need to know that humans don’t, normally, sell their children? I can imagine “selling children” showing up in sarcastic or frustrated remarks by parents, along with texts discussing slavery. I suspect there are few texts out there that actually state that selling your children is a bad idea. Likewise, how much more text would you need to know that Methodist general conferences take place every four years, not annually? The general conference in question generated some press coverage, but not a lot; it’s reasonable to assume that GPT-3 had most of the facts that were available. What additional data would a large language model need to avoid making these mistakes? Minutes from prior conferences, documents about Methodist rules and procedures, and a few other things. As modern datasets go, it’s probably not very large; a few gigabytes, at most. But then the question becomes “How many specialized datasets would we need to train a general intelligence so that it’s accurate on any conceivable topic?”  Is that answer a million?  A billion?  What are all the things we might want to know about? Even if any single dataset is relatively small, we’ll soon find ourselves building the successor to Douglas Adams’ Deep Thought.

Scale isn’t going to help. But in that problem is, I think, a solution. If I were to build an artificial therapist bot, would I want a general language model?  Or would I want a language model that had some broad knowledge, but has received some special training to give it deep expertise in psychotherapy? Similarly, if I want a system that writes news articles about religious institutions, do I want a fully general intelligence? Or would it be preferable to train a general model with data specific to religious institutions? The latter seems preferable–and it’s certainly more similar to real-world human intelligence, which is broad, but with areas of deep specialization. Building such an intelligence is a problem we’re already on the road to solving, by using large “foundation models” with additional training to customize them for special purposes. GitHub’s Copilot is one such model; O’Reilly Answers is another.

If a “general AI” is no more than “a model that can do lots of different things,” do we really need it, or is it just an academic curiosity?  What’s clear is that we need better models for specific tasks. If the way forward is to build specialized models on top of foundation models, and if this process generalizes from language models like GPT-3 and O’Reilly Answers to other models for different kinds of tasks, then we have a different set of questions to answer. First, rather than trying to build a general intelligence by making an even bigger model, we should ask whether we can build a good foundation model that’s smaller, cheaper, and more easily distributed, perhaps as open source. Google has done some excellent work at reducing power consumption, though it remains huge, and Facebook has released their OPT model with an open source license. Does a foundation model actually require anything more than the ability to parse and create sentences that are grammatically correct and stylistically reasonable?  Second, we need to know how to specialize these models effectively.  We can obviously do that now, but I suspect that training these subsidiary models can be optimized. These specialized models might also incorporate symbolic manipulation, as Marcus suggests; for two of our examples, psychotherapy and religious institutions, symbolic manipulation would probably be essential. If we’re going to build an AI-driven therapy bot, I’d rather have a bot that can do that one thing well than a bot that makes mistakes that are much subtler than telling patients to commit suicide. I’d rather have a bot that can collaborate intelligently with humans than one that needs to be watched constantly to ensure that it doesn’t make any egregious mistakes.

We need the ability to combine models that perform different tasks, and we need the ability to interrogate those models about the results. For example, I can see the value of a chess model that included (or was integrated with) a language model that would enable it to answer questions like “What is the significance of Black’s 13th move in the 4th game of FischerFisher vs. Spassky?” Or “You’ve suggested Qc5, but what are the alternatives, and why didn’t you choose them?” Answering those questions doesn’t require a model with 600 different abilities. It requires two abilities: chess and language. Moreover, it requires the ability to explain why the AI rejected certain alternatives in its decision-making process. As far as I know, little has been done on this latter question, though the ability to expose other alternatives could be important in applications like medical diagnosis. “What solutions did you reject, and why did you reject them?” seems like important information we should be able to get from an AI, whether or not it’s “general.”

An AI that can answer those questions seems more relevant than an AI that can simply do a lot of different things.

Optimizing the specialization process is crucial because we’ve turned a technology question into an economic question. How many specialized models, like Copilot or O’Reilly Answers, can the world support? We’re no longer talking about a massive AGI that takes terawatt-hours to train, but about specialized training for a huge number of smaller models. A psychotherapy bot might be able to pay for itself–even though it would need the ability to retrain itself on current events, for example, to deal with patients who are anxious about, say, the invasion of Ukraine. (There is ongoing research on models that can incorporate new information as needed.) It’s not clear that a specialized bot for producing news articles about religious institutions would be economically viable. That’s the third question we need to answer about the future of AI: what kinds of economic models will work? Since AI models are essentially cobbling together answers from other sources that have their own licenses and business models, how will our future agents compensate the sources from which their content is derived? How should these models deal with issues like attribution and license compliance?

Finally, projects like Gato don’t help us understand how AI systems should collaborate with humans. Rather than just building bigger models, researchers and entrepreneurs need to be exploring different kinds of interaction between humans and AI. That question is out of scope for Gato, but it is something we need to address regardless of whether the future of artificial intelligence is general or narrow but deep. Most of our current AI systems are oracles: you give them a prompt, they produce an output.  Correct or incorrect, you get what you get, take it or leave it. Oracle interactions don’t take advantage of human expertise, and risk wasting human time on “obvious” answers, where the human says “I already know that; I don’t need an AI to tell me.”

There are some exceptions to the oracle model. Copilot places its suggestion in your code editor, and changes you make can be fed back into the engine to improve future suggestions. Midjourney, a platform for AI-generated art that is currently in closed beta, also incorporates a feedback loop.

In the next few years, we will inevitably rely more and more on machine learning and artificial intelligence. If that interaction is going to be productive, we will need a lot from AI. We will need interactions between humans and machines, a better understanding of how to train specialized models, the ability to distinguish between correlations and facts–and that’s only a start. Products like Copilot and O’Reilly Answers give a glimpse of what’s possible, but they’re only the first steps. AI has made dramatic progress in the last decade, but we won’t get the products we want and need merely by scaling. We need to learn to think differently.

Categories: Technology

Radar Trends to Watch: June 2022

O'Reilly Radar - Wed, 2022/06/01 - 04:54

The explosion of large models continues. Several developments are especially noteworthy. DeepMind’s Gato model is unique in that it’s a single model that’s trained for over 600 different tasks; whether or not it’s a step towards general intelligence (the ensuing debate may be more important than the model itself), it’s an impressive achievement. Google Brain’s Imagen creates photorealistic images that are impressive, even after you’ve seen what DALL-E 2 can do. And Allen AI’s Macaw (surely an allusion to Emily Bender and Timnit Gebru’s Stochastic Parrots paper) is open source, one tenth the size of GPT-3, and claims to be more accurate. Facebook/Meta is also releasing an open source large language model, including the model’s training log, which records in detail the work required to train it.

Artificial Intelligence
  • Is thinking of autonomous vehicles as AI systems rather than as robots the next step forward? A new wave of startups is trying techniques such as reinforcement learning to train AVs to drive safely.
  • Generative Flow Networks may be the next major step in building better AI systems.
  • The ethics of building AI bots that mimic real dead people seems like an academic question, until someone does it: using GPT-3, a developer created a bot based on his deceased fiancée. OpenAI objected, stating that building such a bot was a violation of its terms of service.
  • Cortical Labs and other startups are building computers that incorporate human neurons. It’s claimed that these systems can be trained to perform game-playing tasks significantly faster than traditional AI.
  • Google Brain has built a new text-to-image generator called Imagen that creates photorealistic images. Although images generated by projects like this are always cherry-picked, the image quality is impressive; the developers claim that it is better than DALL-E 2.
  • DeepMind has created a new “generalist” model called Gato. It is a single model that can solve many different kinds of tasks: playing multiple games, labeling images, and so on. It has prompted a debate on whether Artificial General Intelligence is simply a matter of scale.
  • AI in autonomous vehicles can be used to eliminate waiting at traffic lights, increase travel speed, and reduce fuel consumption and carbon emissions. Surprisingly, if only 25% of the vehicles are autonomous, you get 50% of the benefit.
  • Macaw is a language model developed by Allen AI (AI2). It is freely available and open-source. Macaw is 1/10th the size of GPT-3 and roughly 10% more accurate at answering questions, though (like GPT-3) it tends to fail at questions that require common sense or involve logical tricks.
  • Ai-da is an AI-driven robot that can paint portraits–but is it art? Art is as much about human perception as it is about creation. What social cues prompt us to think that a robot is being creative?
  • Facebook/Meta has created a large language model called OPT that is similar in size and performance to GPT-3. Using the model is free for non-commercial work; the code is being released open source, along with documents describing how the model was trained.
  • Alice is a modular and extensible open source virtual assistant (think Alexa) that can run completely offline. It is private by default, though it can be configured to use Amazon or Google as backups. Alice can identify different users (for whom it can develop “likes” or “dislikes,” based on interactions).
  • High volume event streaming without a message queue: Palo Alto Networks has built a system for processing terabytes of security events per day without using a message queue, just a NoSQL database.
  • New tools allow workflow management across groups of spreadsheets. Spreadsheets are the original “low code”; these tools seem to offer spreadsheet users many of the features that software developers get from tools like git.
  • Portainer is a container management tool that lets you mount Docker containers as persistent filesystems.
  • NVIDIA has open-sourced its Linux device drivers. The code is available on GitHub. This is a significant change for a company that historically has avoided open source.
  • A startup named Buoyant is building tools to automate management of Linkerd. Linkerd, in turn, is a service mesh that is easier to manage and more appropriate for small to medium businesses, than Istio.
  • Are we entering the “third age of JavaScript”? An intriguing article suggests that we are. In this view of the future, static site generation disappears, incremental rendering and edge routing become more important, and Next.js becomes a dominant platform.
  • Rowy is a low-code programming environment that intends to escape the limitations of Airtable and other low-code collaboration services. The interface is like a spreadsheet, but it’s built on top of the Google Cloud Firestore document database.
  • PyScript is framework for running Python in the browser, mixed with HTML (in some ways, not unlike PHP). It is based on Pyodide (a WASM implementation of Python), integrates well with JavaScript, and might support other languages in the future.
  • Machine learning raises the possibility of undetectable backdoor attacks, malicious attacks that can affect the output of a model but don’t measurably detect its performance. Security issues for machine learning aren’t well understood, and aren’t getting a lot of attention.
  • In a new supply chain attack, two widely used libraries (Python’s ctx and PHP’s PHPass) have been compromised to steal AWS credentials. The attacker now claims that these exploits were “ethical research,” possibly with the goal of winning bounties for reporting exploits.
  • While it is not yet accurate enough to work in practice, a new method for detecting cyber attacks can detect and stop attacks in under one second.
  • The Eternity Project is a new malware-as-a-service organization that offers many different kinds of tools for data theft, ransomware, and many other exploits. It’s possible that the project is itself a scam, but it appears to be genuine.
  • Palo Alto Networks has published a study showing that most cloud identity and access management policies are too permissive, and that 90% of the permissions granted are never used. Overly-permissive policies are a major vulnerability for cloud users.
  • NIST has just published a massive guide to supply chain security. For organizations that can’t digest this 326-page document, they plan to publish a quick-start guide.
  • The Passkey standard, supported by Google, Apple, and Microsoft, replaces passwords with other forms of authentication. An application makes an authentication request to the device, which can then respond using any authentication method it supports. Passkey is operating system-independent, and supports both Bluetooth in addition to Internet protocols.
  • Google and Mandiant both report significant year-over-year increases in the number of 0-day vulnerabilities discovered in 2021.
  • Interesting statistics about ransomware attacks: The ransom is usually only 15% of the total cost of the attack; and on average, the ransom is 2.8% of net revenue (with discounts of up to 25% for prompt payment).
  • Bugs in the most widely used ransomware software, including REvil and Conti, can be used to prevent the attacker from encrypting your data.
Web and Web3 VR/AR/Metaverse
  • Niantic is building VPS (Visual Positioning System), an augmented reality map of the world, as part of its Lightship platform. VPS allows games and other AR products to be grounded to the physical world.
  • LivingCities is building a digital twin of the real world as a platform for experiencing the world in extended reality. That experience includes history, a place’s textures and feelings, and, of course, a new kind of social media.
  • New research in haptics allows the creation of realistic virtual textures by measuring how people feel things. Humans are extremely sensitive to the textures of materials, so creating good textures is important for everything from video games to telesurgery.
  • Google is upgrading its search engine for augmented reality: they are integrating images more fully into searches, creating multi-modal searches that incorporate images, text, and audio, and generating search results that can be explored through AR.
  • BabylonJS is an open source 3D engine, based on WebGL and WebGPU, that Microsoft developed. It is a strong hint that Microsoft’s version of the Metaverse will be web-based. It will support WebXR.
  • The fediverse is an ensemble of microblogging social media sites (such as Mastodon) that communicate with each other. Will they become a viable alternative to Elon Musk’s Twitter?
  • Varjo is building a “reality cloud”: a 3D mixed reality streaming service that allows photorealistic “virtual teleportation.” It’s not about weird avatars in a fake 3D world; they record your actions in your actual environment.
Hardware Design
  • Ethical design starts with a redefinition of success: well-being, equity, and sustainability, with good metrics for measuring your progress.
Quantum Computing
  • QICK is a new standardized control plane for quantum devices. The design of the control plane, including software, is all open source. A large part of the cost of building a quantum device is building the electronics to control it. QICK will greatly reduce the cost of quantum experimentation.
  • Researchers have built logical gates using error-corrected quantum bits. This is a significant step towards building a useful quantum computer.
Categories: Technology

Building a Better Middleman

O'Reilly Radar - Tue, 2022/05/17 - 03:58

In the previous article, I explored the role of the middleman in a two-sided marketplace.  The term “middleman” has a stigma to it. Mostly because, when you sit between two parties that want to interact, it’s easy to get greedy.

Greed will bring you profits in the short term. Probably in the long term, as well.  As a middleman, though, your greed is an existential threat.  When you abuse your position and mistreat the parties you connect–when your cost outweighs your value–they’ll find a way to replace you. Maybe not today, maybe not tomorrow, but it will happen.

Luckily, you can make money as a middleman and still keep everyone happy.  Here’s how to create that win-win-win triangle:

Keep refining your platform

Running a marketplace is a game of continuous improvement. You need to keep asking yourself: how can I make this better for the people who interact through the marketplace?

To start, you can look for ways to make your platform more attractive to existing customers. I emphasize both customers, not just one side of the marketplace. Mistreating one side to favor the other may work for a time, but it will eventually fall through. Frustration has a way of helping people overcome switching costs.

Some stock exchanges designate market makers (“specialists,” if you’re old-school), firms that are always ready to both buy and sell shares of a given stock. If I want to offload a thousand shares and there’s no one who wants to buy them from me, the market maker steps in to play the role of the buyer. By guaranteeing that there will always be someone on the other side of the bid or ask, exchanges keep everyone happy.

If you constantly review how the two parties interact, you can look for opportunities to mitigate their risk, create new services, or otherwise reduce friction. Most platforms connect strangers, right?  So if you look at your business through the lens of safety, you’ll find a lot of work to do. Note how eBay’s review system provides extra assurance for buyers and sellers to trade with people they’ve never met.  Similarly, in the early days of online commerce, credit card issuers limited shoppers’ fraud risk to just $50 per purchase.  This improved consumers’ trust in online shopping, which helped make e-commerce the everyday norm that it is today.

Safety improvements also extend to communications. Do the parties really need to swap e-mail addresses or phone numbers?  If they’re just confirming a rideshare pickup or flirting through a dating app, probably not.  As a middleman, you are perfectly positioned to serve as the conduit;  one that provides an appropriate level of masking or pseudonymity.  And the money you invest in deploying a custom messaging system or temporary phone numbers (Twilio, anyone?) will pay off in terms of improved adoption and retention.

Design new products and services

If you understand how your parties interact and what they want to achieve, you’re in a position to spot new product opportunities that will make your customers happy.

From a conversation with Cyril Nigg, Director of Analytics at Reverb, the music-gear marketplace was “founded by music makers, for music makers.”  Musicians like to try new gear, but they want to offload it if it doesn’t pan out. Reverb has therefore built tools around pricing assistance to help musicians with their product listings: You want to sell this distortion pedal within 7 days? List it as $X. This extra assurance that they’ll be able to resell a piece of equipment, in short order, reduces apprehensions about buying. (Going back to the point about keeping both sides of the marketplace happy: Cyril also pointed out that a Reverb customer may act as both buyer and seller across different transactions.  That means the company can’t skimp on one side of the experience.)

People on a dating site want to communicate, so an easy win there is to keep an eye on new communications tools. Maybe your platform started out with an asynchronous, text-based tool that resembled e-mail.  Can you add an option for real-time chat?  What would it take to move up to voice? And ultimately, video? Each step in the progression requires advances in technology, so you may have to wait before you can actually deploy something. But if you can envision the system you want, you can keep an eye on the tech and be poised to pounce when it is generally available.

Unlike dating sites, financial exchanges are marketplaces for opposing views. One person thinks that some event will happen, they seek a counterpart who thinks that it will not, and fate determines the winner.  This can be as vanilla as people buying or selling shares of stock, where the counterparties believe the share price will rise or fall, respectively.  You also see situations that call for more exotic tools.  In the lead-up to what would become the 2008 financial crisis, investors wanted to stake claims around mortgage-backed securities but there wasn’t a way to express the belief that those prices would fall. In response to this desire, a group of banks dusted off the credit default swap (CDS) concept and devised a standard, easily-tradable contract.  Now there was a way for people to take either side of the trade, and for the banks to collect fees in the middle.  A win-win-win situation.

(Well, the actual trade was a win-win-win. The long-term outcome was more of a lose-lose-win. Mortgage defaults rose, sending prices for the associated mortgage-backed securities into decline, leading to big payouts for the “I told you this was going to happen” side of each CDS contract. The banks that served double-duty as both market participant and middleman took on sizable losses as a result. Let this be a lesson to you: part of why a middleman makes money is precisely because they have no stake in the long-term outcome of putting the parties together. Stay in the middle if you want to play it safe.)

Granted, you don’t have to roll out every possible product or feature on your first day. You have to let the marketplace grow and mature somewhat, to see what will actually be useful. Still, you want to plan ahead. As you watch the marketplace, you will spot opportunities well in advance, so you can position yourself to implement them before the need is urgent.

Focus on your business

Besides making things easier for customers, being a better middleman means improving how your business runs.

To start, identify and eliminate inefficiencies in your operations. I don’t mean that you should cut corners, as that will come back to bite you later.  I mean that you can check for genuine money leaks. The easy candidates will be right there on your balance sheet: have you actually used Service ABC in the last year?  If not, maybe it’s time to cut it. Is there an equivalent to Service XYZ at a lower price? Once you’ve confirmed that the cheaper service is indeed a suitable replacement, it’s time to make the switch.

A more subtle candidate is your codebase. Custom code is a weird form of debt. It requires steady, ongoing maintenance just like payments in a loan. It may also require disruptive changes if you encounter a bug. (Imagine that your mortgage lender occasionally demanded a surprise lump sum in mid-month.) Can you replace that home-grown system with an off-the-shelf tool or a third-party service, for a cheaper and more predictable payment schedule?

You also want to check on the size of your total addressable market (TAM).  What happens when you’ve reached everyone who will ever join? It’s emotionally reassuring to tell yourself that the entire planet will use your service, sure. But do you really want to base revenue projections on customers you can’t realistically acquire or retain? At some point, your customer numbers will plateau (and, after that, sink). You need to have a difficult conversation with yourself, your leadership team, and your investors around how you’ll handle that. And you need to have that conversation well in advance. Once you hit that limit on your TAM, you’ll need to be ready to deliver improvements that reduce churn. Perhaps you can offer new services, which may extend your addressable market into new territory, but even that has its limits.

What are you doing for risk management? A risk represents a possible future entry on your balance sheet, one of indeterminate size. Maybe it’s a code bug that spirals out of control under an edge case. Or a lingering complaint that blossoms into a full-scale PR issue. To be blunt: good risk management will save you money. Possibly lots of money. While it’s tempting to let some potential problems linger, understand that it’s easier and cheaper to address them early and on your own schedule. That’s much nicer than being under pressure to fix a surprise in real-time.

Sharp-eyed readers will catch that subtle tradeoff between “addressing inefficiencies” and “proactively mitigating risks.” Risk management often requires that you leave extra slack in the system, such as higher staff headcount, or extra machines that mostly sit idle. This slack serves as a cushion in the event of a surge in customer activity but it also costs money.  There’s no easy answer here. It’s a blend of art and science to spot the difference between slack and waste.

Most of all, as a marketplace, you want to mature with your customers and the field overall. The term “innovate” gets some much-deserved flack, but it’s not complete hogwash. Be prepared to invest in research so you can see what changes are on the horizon, and then adapt accordingly. Also, keep an eye on the new features your customers are asking for, or the complaints they raise about your service. You’ll  otherwise fall into the very trap described in The Innovator’s Dilemma. Don’t become the slow-moving, inattentive behemoth that some nimble upstart will work to unseat.

Use technology as a force multiplier

Bad middlemen squeeze the parties they connect; good middlemen squeeze technology.

Done well, technology is a source of asymmetric advantage. Putting code in the right places allows you to accomplish more work, more consistently, with fewer people, and in less time. All of the efficiencies you get through code will leave more money to split between yourself and your customers.  That is a solid retention strategy.

To start, you can apply software to real and artificial scarcity that exists in other middlemen. A greenfield operation can start with lower headcount, less (or zero!) office space, and so on.

Tech staffing, for example, is a matching problem at its core. A smart staffing firm would start with self-service search tools so a company could easily find people to match their open roles. No need to interact with a human recruiter. It could also standardize contract language to reduce legal overhead (no one wants a thousand slightly-different contracts laying around, anyway) and use electronic signatures to make it easier to store paperwork for future reference.

You don’t even have to do anything fancy. Sometimes, the very act of putting something online is a huge step up from the incumbent solution. Craigslist, simply by running classified ads on a website, gave people a much-improved experience over the print-newspaper version. People had more space to write (goodbye, obscure acronyms), had search functionality (why skim all the listings to find what you’re after?), and could pull their ad when it had been resolved (no more getting phone calls for an extra week just because the print ad is still visible).

Technology also makes it easier to manage resources. Love or loathe them, rideshare companies like Lyft and Uber can scale to a greater number of drivers and riders than the old-school taxi companies that rely on radio dispatch and flag-pulls. And they can do it with less friction. Why call a company and tell them your pickup location, when an app can use your phone’s GPS? And why should that dispatcher have to radio around in search of a driver? To arrange a ride, you need to match three elements–pickup location, dropoff location, and number of passengers–to an available driver. This is a trivial effort for a computer. Throw in mobile apps for drivers and passengers, and you have a system that can scale very well.

(Some may argue that the rideshare companies get extra scale because their drivers are classified as independent contractors, and because they don’t require expensive taxi medallions. I don’t disagree. I just want to point out that the companies’ technology is also a strong enabler.)

Being at the center of the marketplace means you get to see the entire system at once. You can analyze the data around customer activity, and pass on insights to market participants to make their lives easier. Airbnb, for example, has deep insight into how different properties perform. Their research team determined that listings with high-quality photos tend to earn more revenue. They publicized this information to help hosts and, to sweeten the deal, the company then built a service to connect hosts with professional photographers.

What about ML/AI? While I hardly believe that it’s ready to eat every job, I do see opportunities for AI to make a smaller team of people more effective. ML models are well-suited for decisions that are too fuzzy or cumbersome to be expressed as hard rules in software, but not so nuanced that they require human judgment. Putting AI in the seat for those decisions frees up your team for things that genuinely merit a human’s eyes and expertise.

I’ve argued before that a lot of machine learning is high-powered matching. What is “classification,” if not rating one item’s similarity to an archetype?  A marketplace that deals in the long tail of goods can use ML to help with that matching.

Take Reverb, where most pieces of gear are unique but still similar to other items. They’re neither completely fungible, nor completely non-fungible.  They’re sort of semi-fungible. To simplify search, then, Director of Analytics Cyril Nigg says that the company groups related items into ML-based canonical products (where some specific Product X is really part of a wider Canonical Product Y). “[We use] ML to match listings to a product–say, matching on title, price point, or some other attribute. This tells us, with a high degree of confidence, that a seller’s used Fender guitar is actually an American Standard Stratocaster. Now that we know the make and model, a buyer can easily compare all the different listings within that product to help them find the best option. This ML system learns over time, so that a seller can upload a listing and the system can file it under the proper canonical product.”

Machine-based matching works for food as well as guitars. Resham Sarkar heads up data science at Slice, which gives local pizzerias the tools, technology and guidance they need to thrive. In a 2021 interview, she told me how her team applies ML to answer the age-old question: will Person X enjoy Pizza Y at Restaurant Z? Slice’s recommendations give eaters the confidence to try a new flavor in a new location, which helps them (maybe they’ll develop a new favorite) and also helps pizzerias (they get new customers). This is especially useful when a pizza lover lands in a new city and doesn’t know where to get their fix.

Any discussion of technology wouldn’t be complete without a nod to emerging tech. Yes, keeping up with the Shiny New Thing of the Moment means having to wade through plenty of hype. But if you look closely, you may also find some real game-changers for your business. This was certainly true of the 1990s internet boom. We’ve seen it in the past decade of what we now call AI, across all of its rebrandings. And yes, I expect that blockchain technologies will prove more useful than the curmudgeons want to let on.  (Even NFTs. Or, especially NFTs.)

Skip past the success stories and vendor pitches, though. Do your own homework on what the new technology really is and what it can do. Then, engage an expert to help you fill in the gaps and sort out what is possible with your business. The way a new technology addresses your challenges may not align with whatever is being hyped in the news, but who cares? All that matters is that it drives improvements for your use cases.

Watch your tech

Technology is a double-edged sword. It’s like using leverage in the stock market: employing software or AI exposes you to higher highs when things go right, but also lower lows when things unravel.

One benefit to employing people to perform a task is that they can notice when something is wrong and then stop working. A piece of code, by comparison, has no idea that it is operating out of its depth. The same tools that let you do so much more, with far fewer people, also expose you to a sizable risk: one bug or environmental disconnect can trigger a series of errors, at machine speeds, cascading into a massive failure.

All it takes is for a few smaller problems to collide. Consider the case of Knight Capital. This experienced, heavyweight market-maker once managed $21BN in daily transaction volume on the NYSE. One day in 2012, an inconsistent software deployment met a branch of old code, which in turn collided with a new order type on the exchange. This led to a meltdown in which Knight Capital lost $440M in under an hour.

The lesson here is that some of the money you save from reduced headcount should be reinvested in the company in the form of people and tools to keep an eye on the larger system. You’ll want to separate responsibilities in order to provide checks and balances, such as assigning someone who is not a developer to manage and review code deployments. Install monitors that provide fine-grained information about the state of your systems. Borrowing a line from a colleague: you can almost never have too many dimensions of data when troubleshooting.

You’ll also need people to step in when someone gets caught in your web of automation. Have you ever called a company’s customer service line, only to wind up in a phone-tree dead-end? That can be very frustrating. You don’t want that for your customers, so you need to build escape hatches that route them to a person. That holds for your AI-driven chatbot as much as your self-help customer service workflows. And especially for any place where people can report a bug or an emergency situation.

Most of all, this level of automation requires a high-caliber team. Don’t skimp on hiring. Pay a premium for very experienced people to build and manage your technology. If you can, hire someone who has built trading systems on Wall St. That culture is wired to identify and handle risk in complex, automated systems where there is a lot of real money at stake.  And they have seen technology fail in ways that you cannot imagine.

Markets, everywhere

I’ve often said that problems in technology are rarely tech-related; they’re people-related. The same holds for building a marketplace, where the big problem is really human greed.

Don’t fall for the greed trap. You can certainly run the business in a way that brings you revenue, keeps customers happy, and attracts new prospects. Identify inefficiencies in your business operations, and keep thinking of ways to make the platform better for your customers. That’s it.  A proper application of software and AI, risk management, and research into emerging technologies should help you with both. And the money you save, you can split with your user base.

If you’re willing to blur the lines a little, you will probably find markets in not-so-obvious places. An airline sits between passengers and destinations. Grocery stores sit between shoppers and suppliers. Employers sit between employees and clients. And so on. Once you find the right angle, you can borrow ideas from the established, well-run middlemen to improve your business.

(Many thanks to Chris Butler for his thoughtful and insightful feedback on early drafts of this article.)

Categories: Technology

Quantum Computing without the Hype

O'Reilly Radar - Tue, 2022/05/10 - 04:45

Several weeks ago, I had a great conversation with Sebastian Hassinger about the state of quantum computing. It’s exciting–but also, not what a lot of people are expecting.

I’ve seen articles in the trade press telling people to invest in quantum computing now or they’ll be hopelessly behind. That’s silly. There are too many people in the world who think that a quantum computer is just a fast mainframe. It isn’t; quantum programming is completely different, and right now, the number of algorithms we know that will work on quantum computers is very small. You can count them on your fingers and toes. While it’s probably important to prepare for quantum computers that can decrypt current cryptographic codes, those computers won’t be around for 10-20 years. While there is still debate on how many physical qubits will be needed for error correction, and even on the meaning of a “logical” (error-corrected) qubit, the most common  estimates are that it will require on the order of 1,000 error corrected qubits to break current encryption systems, and that it will take 1,000 physical qubits to make one error corrected qubit. So we’ll need an order of 1 million qubits, and current quantum computers are all in the area of 100 qubits. Figuring out how to scale our current quantum computers by 5 orders of magnitude may well be the biggest problem facing researchers, and there’s no solution in sight.

So what can quantum computers do now that’s interesting? First, they are excellent tools for simulating quantum behavior: the behavior of subatomic particles and atoms that make up everything from semiconductors to bridges to proteins. Most, if not all, modeling in these areas is based on numerical methods–and modern digital computers are great at that. But it’s time to think again about non-numerical methods: can a quantum computer simulate directly what happens when two atoms interact? Can it figure out what kind of molecules will be formed, and what their shapes will be? This is the next step forward in quantum computing, and while it’s still research, It’s a significant way forward. We live in a quantum world. We can’t observe quantum behavior directly, but it’s what makes your laptop work and your bridges stay up. If we can model that behavior directly with quantum computers, rather than through numeric analysis, we’ll make a huge step forward towards finding new kinds of materials, new treatments for disease, and more. In a way, it’s like the difference between analog and digital computers. Any engineer knows that digital computers spend a lot of time finding approximate numeric solutions to complicated differential equations. But until digital computers got sufficiently large and fast, the behavior of those systems could be modeled directly on analog computers. Perhaps the earliest known examples of analog computers are Stonehenge and the Antikythera mechanism, both of which were used to predict astronomical positions. Thousands of years before digital computers existed, these analog computers modeled the behavior of the cosmos, solving equations that their makers couldn’t have understood–and that we now solve numerically on digital computers.

Recently, researchers have developed a standardized control plane that should be able to work with all kinds of quantum devices. The design of the control plane, including software, is all open source. This should greatly decrease the cost of experimentation, allowing researchers to focus on the quantum devices themselves, instead of designing the circuitry needed to manage the qubits.  It’s not unlike the dashboard of a car: relatively early in automotive history, we developed a fairly standard set of tools for displaying data and controlling the machinery.  If we hadn’t, the development of automobiles would have been set back by decades: every automaker would need to design its own controls, and you’d need fairly extensive training on your specific car before you could drive it. Programming languages for quantum devices also need to standardize; fortunately, there has already been a lot of work in that direction.  Open source development kits that provide libraries that can be called from Python to perform quantum operations (Qiskit, Braket, and Cirq are some examples), and OpenQASM is an open source “quantum assembly language” that lets programmers write (virtual) machine-level code that can be mapped to instructions on a physical machine.

Another approach to simulating quantum behavior won’t help probe quantum behavior, but might help researchers to develop algorithms for numerical computing. P-bits, or probabilistic bits, behave probabilistically but don’t depend on quantum physics: they’re traditional electronics that work at room temperature. P-bits have some of the behavior of qubits, but they’re much easier to build; the developers call them “poor man’s qubits.” Will p-bits make it easier to develop a quantum future?  Possibly.

It’s important not to get over-excited about quantum computing. The best way to avoid a “trough of disillusionment” is to be realistic about your expectations in the first place. Most of what computers currently do will remain unchanged. There will be some breakthroughs in areas like cryptography, search, and a few other areas where we’ve developed algorithms. Right now, “preparing for quantum computing” means evaluating your cryptographic infrastructure. Given that infrastructure changes are difficult, expensive, and slow, it makes sense to prepare for quantum-safe cryptography now. (Quantum-safe cryptography is cryptography that can’t be broken by quantum computers–it does not require quantum computers.)  Quantum computers may still be 20 years in the future, but infrastructure upgrades could easily take that long.

Practical (numeric) quantum computing at significant scale could be 10 to 20 years away, but a few breakthroughs could shorten that time drastically.  In the meantime, a lot of work still needs to be done on discovering quantum algorithms. And a lot of important work can already be done by using quantum computers as tools for investigating quantum behavior. It is an exciting time; it’s just important to be excited by the right things, and not misled by the hype.

Categories: Technology

Radar trends to watch: May 2022

O'Reilly Radar - Tue, 2022/05/03 - 04:19

April was the month for large language models. There was one announcement after another; most new models were larger than the previous ones, several claimed to be significantly more energy efficient. The largest (as far as we know) is Google’s GLAM, with 1.2 trillion parameters–but requiring significantly less energy to train than GPT-3. Chinchilla has ¼ as many parameters as GPT-3, but claims to outperform it. It’s not clear where the race to bigger and bigger models will end, or where it will lead us. The PaLM model claims to be able to reason about cause and effect (in addition to being more efficient than other large models); we don’t yet have thinking machines (and we may never), but we’re getting closer. It’s also good to see that energy efficiency has become part of the conversation.

  • Google has created GLAM a 1.2 trillion parameter model (7 times the size of GPT-3).  Training GLAM required 456 megawatt-hours,  ⅓ the energy of GPT-3. GLAM uses a Mixture-of-Experts (MoE) model, in which different subsets of the neural network are used, depending on the input.
  • Google has released a dataset of 3D-scanned household items.  This will be invaluable for anyone working on AI for virtual reality.
  • FOMO (Faster Objects, More Objects) is a machine learning model for object detection in real time that requires less than 200KB of memory. It’s part of the TinyML movement: machine learning for small embedded systems.
  • LAION (Large Scale Artificial Intelligence Open Network) is a non-profit, free, and open organization that is creating large models and making them available to the public. It’s what OpenAI was supposed to be. The first model is a set of image-text pairs for training models similar to DALL-E.
  • NVidia is using AI to automate the design of their latest GPU chips
  • Using AI to inspect sewer pipes is one example of an “unseen” AI application. It’s infrastructural, it doesn’t risk incorporating biases or significant ethical problems, and (if it works) it improves the quality of human life.
  • Large language models are generally based on text. Facebook is working on building a language model from spoken language, which is a much more difficult problem.
  • STEGO is a new algorithm for automatically labeling image data. It uses transformers to understand relationships between objects, allowing it to segment and label objects without human input.
  • A researcher has developed a model for predicting first impressions and stereotypes, based on a photograph.  They’re careful to say that this model could easily be used to fine-tune fakes for maximum impact, and that “first impressions” don’t actually say anything about a person.
  • A group building language models for the Maori people shows that AI for indigenous languages require different ways of thinking about artificial intelligence, data, and data rights.
  • A21 is a new company offering a large language model “as a service.” They allow customers to train custom versions of their model, and they claim to make humans and machines “thought partners.”
  • Researchers have found a method for reducing toxic text generated by language models. It sounds like a GAN (generative adversarial network), in which a model trained to produce toxic text “plays against” a model being trained to detect and reject toxicity.
  • More bad applications of AI: companies are using AI to monitor your mood during sales calls.  This questionable feature will soon be coming to Zoom.
  • Primer has developed a tool that uses AI to transcribe, translate, and analyze intercepted communications in the war between Russia and Ukraine.
  • Deep Mind claims that another new large language model, Chinchilla, outperforms GPT-3 and Gopher with roughly ¼th the number of parameters. It was trained on roughly 4 times as much data, but with fewer parameters, it requires less energy to train and fine-tune.
  • Data Reliability Engineering (DRE) borrows ideas from SRE and DevOps as a framework to provide higher-quality data for machine learning applications while reducing the manual labor required. It’s closely related to data-centric AI.
  • OpenAI’s DALL-E 2 is a new take on their system (DALL-E) for generating images from natural language descriptions. It is also capable of modifying existing artworks based on natural language descriptions of the modifications. OpenAI plans to open DALL-E 2 to the public, on terms similar to GPT-3.
  • Google’s new Pathways Language Model (PaLM) is more efficient, can understand concepts, and reason about cause and effect, in addition to being relatively energy-efficient. It’s another step forward towards AI that actually appears to think.
  • SandboxAQ is an Alphabet startup that is using AI to build technologies needed for a post-quantum world.  They’re not doing quantum computing as such, but solving problems such as protocols for post-quantum cryptography.
  • IBM has open sourced the Generative Toolkit for Scientific Discovery (GT4SD), which is a generative model designed to produce new ideas for scientific research, both in machine learning and in areas like biology and materials science.
  • Waymo (Alphabet’s self-driving car company) now offers driverless service in San Francisco.  San Francisco is a more challenging environment than Phoenix, where Waymo has offered driverless service since 2020. Participation is limited to members of their Trusted Tester program.
  • Mastodon, a decentralized social network, appears to be benefitting from Elon Musk’s takeover of Twitter.
  • Reputation and identity management for web3 is a significant problem: how do you verify identity and reputation without giving applications more information than they should have?  A startup called Ontology claims to have solved it.
  • A virtual art museum for NFTs is still under construction, but it exists, and you can visit it. It’s probably a better experience in VR.
  • 2022 promises to be an even bigger year for cryptocrime than 2021. Attacks are increasingly focused on decentralized finance (DeFi) platforms.
  • Could a web3 version of Wikipedia evade Russia’s demands that they remove “prohibited information”?  Or will it lead to a Wikipedia that’s distorted by economic incentives (like past attempts to build a blockchain-based encyclopedia)?
  • The Helium Network is a decentralized public wide area network using LoRaWAN that pays access point operators in cryptocurrency. The network has over 700,000 hotspots, and coverage in most of the world’s major metropolitan areas.
  • Do we really need another shell scripting language?  The developers of hush think we do.  Hush is based on Lua, and claims to make shell scripting more robust and maintainable.
  • Web Assembly is making inroads; here’s a list of startups using wasm for everything from client-side media editing to building serverless platforms, smart data pipelines, and other server-side infrastructure.
  • QR codes are awful. Are they less awful when they’re animated? It doesn’t sound like it should work, but playing games with the error correction built into the standard allows the construction of animated QR codes.
  • Build your own quantum computer (in simulation)?  The Qubit Game lets players “build” a quantum computer, starting with a single qubit.
  • One of Docker’s founders is developing a new product, Dagger, that will help developers manage DevOps pipelines.
  • Can applications use “ambient notifications” (like a breeze, a gentle tap, or a shift in shadows) rather than intrusive beeps and gongs?  Google has published Little Signals, six experiments with ambient notifications that includes code, electronics, and 3D models for hardware.
  • Lambda Function URLs automate the configuration of an API endpoint for single-function microservices on AWS. They make the process of mapping a URL to a serverless function simple.
  • GitHub has added a dependency review feature that inspects the consequences of a pull request and warns of vulnerabilities that were introduced by new dependencies.
  • Google has proposed Supply Chain Levels for Software Artifacts (SLSA) as a framework for  ensuring the integrity of the software supply chain.  It is a set of security guidelines that can be used to generate metadata; the metadata can be audited and tracked to ensure that software components have not been tampered with and have traceable provenance.
  • Harvard and the Linux Foundation have produced Census II, which lists thousands of the most popular open source libraries and attempts to rank their usage.
  • The REvil ransomware has returned (maybe). Although there’s a lot of speculation, it isn’t yet clear what this means or who is behind it. Nevertheless, they appear to be looking for business partners.
  • Attackers used stolen OAuth tokens to compromise GitHub and download data from a number of organizations, most notably npm.
  • The NSA, Department of Energy, and other federal agencies have discovered a new malware toolkit named “pipedream” that is designed to disable power infrastructure. It’s adaptable to other critical infrastructure systems. It doesn’t appear to have been used yet.
  • A Russian state-sponsored group known as Sandworm failed in an attempt to bring down the Ukraine’s power grid. They used new versions of Industroyer (for attacking industrial control systems) and Caddywiper (for cleaning up after the attack).
  • Re-use of IP addresses by a cloud provider can lead to “cloud squatting,” where an organization that is assigned a previously used IP address receives data intended for the previous addressee. Address assignment has become highly dynamic; DNS wasn’t designed for that.
  • Pete Warden wants to build a coalition of researchers that will discuss ways of verifying the privacy of devices that have cameras and microphones (not limited to phones).
  • Cyber warfare on the home front: The FBI remotely accessed devices at some US companies to remove Russian botnet malware. The malware targets WatchGuard firewalls and Asus routers. The Cyclops Blink botnet was developed by the Russia-sponsored Sandworm group.
  • Ransomware attacks have been seen that target Jupyter Notebooks on notebook servers where authentication has been disabled. There doesn’t appear to be a significant vulnerability in Jupyter itself; just don’t disable authentication!
  • By using a version of differential privacy on video feeds, surveillance cameras can provide a limited kind of privacy. Users can ask questions about the image, but can’t identify individuals. (Whether anyone wants a surveillance camera with privacy features is another question.)
Biology and Neuroscience
  • A brain-computer interface has allowed an ALS patient who was completely “locked in” to communicate with the outside world.  Communication is slow, but it goes well beyond simple yes/no requests.
  • CAT scans aren’t just for radiology. Lumafield has produced a table-sized CT-scan machine that can be used in small shops and offices, with the image analysis done in their cloud.
  • Boston Dynamics has a second robot on the market: Stretch, a box-handling robot designed to perform tasks like unloading trucks and shipping containers.
  • A startup claims it has the ability to put thousands of single-molecule biosensors on a silicon chip that can be mass-produced. They intend to have a commercial product by the end of 2022.
Categories: Technology

Building a Better Middleman

O'Reilly Radar - Tue, 2022/04/19 - 05:22

What comes to mind when you hear the term “two-sided market?” Maybe you imagine a Party A who needs something, so they interact with Party B who provides it, and that’s that.  Despite the number “two” in the name, there’s actually someone else involved: the middleman.  This entity sits between the parties to make it easier for them to interact. (We can generalize that “two” to some arbitrary number and call this an N-sided market or multi-sided marketplace. But we’ll focus on the two-sided form for now.)

Two-sided markets are a fascinating study. They are also quite common in the business world, and therefore, so are middlemen. Record labels, rideshare companies, even dating apps all fall under this umbrella.  The role has plenty of perks, as well as some sizable pitfalls.  “Middleman” often carries a negative connotation because, in all fairness, some of them provide little value compared to what they ask in return.

Still, there’s room for everyone involved—Party A, Party B, and the middleman—to engage in a happy and healthy relationship.  In this first article, I’ll explain more about the middleman’s role and the challenges they face.  In the next article, I’ll explore what it takes to make a better middleman and how technology can play a role.

Paving the Path

When I say that middlemen make interactions easier, I mean that they address a variety of barriers:

  • Discovery: “Where do I find the other side of my need or transaction?” Dating apps like OKCupid, classified ads services such as Craigslist, and directory sites like Angi (formerly Angie’s List) are all a twist on a search engine. Party A posts a description of themself or their service, Party B scrolls and sifts the list while evaluating potential matches for fit.
  • Matching: “Should we interact? Are our needs compatible?” Many middlemen that help with discovery also handle the matching for you, as with ride-share apps.  Instead of you having to scroll through lists of drivers, Uber and Lyft use your phone’s GPS to pair you with someone nearby.  (Compared to the Discovery case, Matching works best when one or both counterparties are easily interchangeable.)
  • Standardization: “The middleman sets the rules of engagement, so we all know what to expect.”  A common example would be when a middleman like eBay sets the accepted methods of payment.  By narrowing the scope of what’s possible—by limiting options—the middleman standardizes how the parties interact.
  • Safety: “I don’t have to know you in order to exchange money with you.” Stock market exchanges and credit card companies build trust with Party A and Party B, individually, so the two parties (indirectly) trust each other through the transitive property.
  • Simplicity: “You two already know each other; I’ll insert myself into the middle, to make the relationship smoother.” Stripe and Squarespace make it easier for companies to sell goods and services by handling payments.  And then there’s Squire, which co-founder Songe Laron describes as the “operating system for the barber shop, [handling] everything from the booking, to the payment, to the point of sales system, to payroll,” and a host of other frictions between barber and customer.  In all cases, each party gets to focus on what it does best (selling goods or cutting hair) while the middleman handles the drudgework.
Nice Work, If You can Get It

As far as their business model, middlemen usually take a cut of transactions as value moves from Party A to Party B. And this arrangement has its benefits.

For one, you’re first in line to get paid: Party A pays you, you take a cut, then you pass the rest on to Party B.  Record labels and book publishers are a common example.  They pair a creator with an audience.  All of the business deals for that creator’s work run through the middleman, who collects the revenue from sales and takes their share along the way.

(The music biz is littered with stories of artists getting a raw deal—making a small percentage of revenue from their albums, while the label takes the lion’s share—but that’s another story.)

Then there’s the opportunity for recurring revenue, if Party A and Party B have an ongoing relationship.  Companies often turn to tech staffing agencies to find staff-augmentation contractors.  Those agencies typically take a cut for the entire duration of the project or engagement, which can run anywhere from a few weeks to more than a decade.  The staffing agency makes one hell of a return on their efforts when placing such a long-term contractor. Nice work, if you can get it.

Staffing agencies may have to refund a customer’s money if a contractor performs poorly.  Some middlemen, however, make money no matter how the deal ultimately turns out.  Did I foolishly believe my friend’s hot stock tip, in his drunken reverie, and pour my savings into a bad investment? Well, NYSE isn’t going to refund my money, which means they aren’t about to lose their cut.

A middleman also gets a bird’s-eye view of the relationships it enables.  It sees who interacts with whom, and how that all happens.  Middlemen that run online platforms have the opportunity to double-dip on their revenue model: first by taking their cut from an interaction, then by collecting and analyzing data around each interaction.  Everything from an end-user’s contact or demographic details, to exploring patterns of how they communicate with other users, can be packaged up and resold.  (This is, admittedly, a little shady. We’ll get to middlemen’s abuse of privilege shortly.)

Saddling Some Burdens, Too

Before you rush out to build your own middleman company, recognize that it isn’t all easy revenue.  You first need to breathe the platform into existence, so the parties can interact.  Depending on the field, this can involve a significant outlay of capital, time, and effort.  Then you need to market the platform so that everyone knows where to go to find the Party B to their Party A.

Once it’s up and running, maintenance costs can be low if you keep things simple.  (Consider the rideshare companies that own the technology platform, but not the vehicles in which passengers ride.) But until you reach that cruising altitude, you’re crossing your fingers that things pan out in your favor.  That can mean a lot of sleepless nights and stressful investor calls.

The middleman’s other big challenge is that they need to keep all of those N sides of the N-sided market happy.  The market only exists because all of the parties want to come together, and your service persists only because they want to come together through you.  If one side gets mad and leaves, the other side(s) will soon follow.  Keeping the peace can be a touchy balancing act.

Consider Airbnb.  Early in the pandemic they earned praise from guests by allowing them to cancel certain bookings without penalty.  It then passed those “savings” on to hosts, who weren’t too happy about the lost revenue.  (Airbnb later created a fund to support hosts, but some say it still fell short.)  The action sent a clear—though, likely, unintentional and incorrect—message that Airbnb valued guests more than hosts.  A modern-day version of robbing Peter to pay Paul.

Keeping all sides happy is a tough line for a middleman to walk.  Mohambir Sawhney, from Northwestern University’s McCormick Foundation, summed this up well: “In any two-sided market, you always have to figure out who you’re going to subsidize more, and who you’re going to actually screw more.” It’s easy for outsiders to say that Airbnb should have just eaten the losses—refunded guests’ money while letting hosts keep their take—but that sounds much easier said than done.  In the end, the company still has to subsidize itself, right?

The subsidize versus screw decision calculus gets even more complicated when one side only wants you but doesn’t need you.  In the Airbnb case, the company effectively serves as a marketing arm and payments processor for property owners.  Any sufficiently motivated owner is just one step away from handling that on their own, so even a small negative nudge can send them packing.  (In economics terms, we say that those owners’ switching costs are low.)

The same holds for the tech sector, where independent contractors can bypass staffing firms to hang their own shingle.  Even rideshare drivers have a choice.  While it would be tougher for them to get their own taxi medallion, they can switch from Uber to Lyft.  Or, as many do, they can sign up with both services so that switching costs are effectively zero: “delete Uber app, keep the Lyft app running, done.”

Making Enemies

Even with those challenges, delivering on the middleman’s raison d’être—”keep all parties happy”—should be a straightforward affair.  (I don’t say “easy,” just “straightforward.” There’s a difference.) Parties A and B clearly want to be together, you’re helping them be together, so the experience should be a win all around.

Why, then, do middlemen have such a terrible reputation?  It mostly boils down to greed.

Once a middleman becomes a sufficiently large and/or established player, they become the de facto place for the parties to meet.  This is a near-monopoly status. The middleman no longer needs to care about keeping one or even both parties happy, they figure, because those groups either interact through the middleman or they don’t interact at all. (This also holds true for the near-cartel status of a group of equally unpleasant middlemen.)

Maybe the middleman suddenly raises fees, or sets onerous terms of service, or simply mistreats one side of the pairing.  This raises the dollar, effort, and emotional cost to the parties since they don’t have many options to leave.

Consider food-delivery apps, which consumers love but can take as much as a 30% cut of an order’s revenue.  That’s a large bite, but easier to swallow when a restaurant has a modest take-away business alongside a much larger dine-in experience. It’s quite another story when take-away is suddenly your entire business and you’re still paying rent on the empty dining room space. Most restaurants found themselves in just this position early in the COVID-19 pandemic. Some hung signs in their windows, asking customers to call them directly instead of using the delivery apps.

Involving a middleman in a relationship can also lead to weird principal-agent problems.  Tech staffing agencies (even those that paint themselves as “consultancies”) have earned a special place here.  Big companies hand such “preferred vendors” a strong moat by requiring contractors to pass through them in lieu of establishing a direct relationship. Since the middlemen can play this Work Through Us, or Don’t Work at All card, it’s no surprise that they’ve been known to take as much as 50% of the money as it passes from client to contractor.  The client companies don’t always know this, so they are happy that the staffing agency has helped them find software developers and DBAs. The contractors, many of whom are aware of the large cuts, aren’t so keen on the arrangement.

This is on top of limiting a tech contractor’s ability to work through a competing agency.  I’ve seen everything from thinly-veiled threats (“if the client sees your resume from more than one agency, they’ll just throw it out”) to written agreements (“this contract says you won’t go through another agency to work with this client”).   What if you’ve found a different agency that will take a smaller cut, so you get more money?  Or what if Agency 1 has done a poor job of representing you, while you know that Agency 2 will get it right?  In both cases, the answer is: tough luck.

A middleman can also resort to more subtle ways to mistreat the parties.  Uber has reportedly used a variety of techniques from behavioral science—such as the gamification of male managers pretending to be women—to encourage drivers to work more.  They’ve also been accused of showing drivers and passengers different routes, charging the passenger for the longer way and paying the driver for the shorter way.

It’s Not All Easy Money

To be fair, middlemen do earn some of their cut. They provide value in that they reduce friction for both the buy and sell sides of an interaction.

This goes above and beyond building the technology for a platform.  Part of how the Deliveroos and Doordashes of the world connect diners to restaurants is by coordinating fleets of delivery drivers.  It would be expensive for a restaurant to do this on their own: hiring multiple drivers, managing the schedule, accounting for demand … and hoping business stays hot so that the drivers aren’t paid to sit idle. Similarly, tech staffing firms don’t just introduce you to contract talent. They also handle time-tracking, invoicing, and legal agreements. The client company cuts one large check to the staffing firm, which cuts lots of smaller checks to the individual contractors.

Don’t forget that handling contracts and processing payments come with extra regulatory requirements. Rules often vary by locale, and the middleman has to spend money to keep track of those rules.  So it’s not all profit.

(They can also build tools to avoid rules, such as Uber’s infamous “greyball” system … but that’s another story.)

That said, a middleman’s benefit varies by the industry vertical and even by the client.  Some argue that their revenue cut far exceeds the value they provide. In the case of tech staffing firms, I’ve heard plenty of complaints that recruiters take far too much money for  just “having a phone number” (having a client relationship) and cutting a check, when it’s the contractor who does the actual work of building software or managing systems for the client.

A Win-Win-Win Triangle

Running a middleman has its challenges and risks.  It can also be tempting to misuse the role’s power.  Still, I say that there’s a way to build an N-sided marketplace where everyone can be happy.  I’ll explore that in the next article in this series.

(Many thanks to Chris Butler for his thoughtful and insightful feedback on early drafts of this article.  I’d also like to thank Mike Loukides for shepherding this piece into its final form.)

Categories: Technology

Virtual Presentation for April 14th

PLUG - Thu, 2022/04/14 - 13:33

This is a remote meeting. Please join by going to at 7pm on Thursday April 14th

der.hans: Jekyll static site generator

Jekyll is a simple to use static site generator.

For easy, low maintenance, low resources like a blog, a static site generator (SSG) avoids complexity.
An SSG allows low effort content creation and easy deployment.
SSG can be hosted by a simple web server.
No need to setup programming languages or lots of modules.

About der.hans:
der.hans is a Free Software, technology and entrepreneurial veteran.

Hans is chairman of the Phoenix Linux User Group (PLUG); BoF organizer, jobs board maintainer, and jobs night organizer for the Southern California Linux Expo (SCaLE); and founder of the Free Software Stammtisch along with the Stammtisch Job Nights.

Currently leads a database support engineering team at ObjectRocket, most likely anything Hans says publicly was not approved by $dayjob.

The General Purpose Pendulum

O'Reilly Radar - Tue, 2022/04/12 - 04:59

Pendulums do what they do: they swing one way, then they swing back the other way.  Some oscillate quickly; some slowly; and some so slowly you can watch the earth rotate underneath them. It’s a cliche to talk about any technical trend as a “pendulum,” though it’s accurate often enough.

We may be watching one of computing’s longest-term trends turn around, becoming the technological equivalent of Foucault’s very long, slow pendulum: the trend towards generalization. That trend has been swinging in the same direction for some 70 years–since the invention of computers, really.  The first computers were just calculating engines designed for specific purposes: breaking codes (in the case of Britain’s Bombe) or calculating missile trajectories. But those primitive computers soon got the ability to store programs, making them much more flexible; eventually, they became “general purpose” (i.e., business) computers. If you’ve ever seen a manual for the IBM 360’s machine language, you’ll see many instructions that only make sense in a business context–for example, instructions for arithmetic in binary coded decimal.

That was just the beginning. In the 70s, word processors started replacing typewriters. Word processors were essentially early personal computers designed for typing–and they were quickly replaced by personal computers themselves. With the invention of email, computers became communications devices. With file sharing software like Napster and MP3 players like WinAmp, computers started replacing radios–then, when Netflix started streaming, televisions. CD and DVD players are inflexible, task-specific computers, much like word processors or the Bombe, and their functions have been subsumed by general-purpose machines.

The trend towards generalization also took place within software. Sometime around the turn of the millenium, many of us realized the Web browsers (yes, even the early Mosaic, Netscape, and Internet Explorer) could be used as a general user interface for software; all a program had to do was express its user interface in HTML (using forms for user input), and provide a web server so the browser could display the page. It’s not an accident that Java was perhaps the last programming language to have a graphical user interface (GUI) library; other languages that appeared at roughly the same time (Python and Ruby, for example) never needed one.

If we look at hardware, machines have gotten faster and faster–and more flexible in the process. I’ve already mentioned the appearance of instructions specifically for “business” in the IBM 360. GPUs are specialized hardware for high-speed computation and graphics; however, they’re much less specialized than their ancestors, dedicated vector processors.  Smartphones and tablets are essentially personal computers in a different form factor, and they have performance specs that beat supercomputers from the 1990s. And they’re also cameras, radios, televisions, game consoles, and even credit cards.

So, why do I think this pendulum might start swinging the other way?  A recent article in the Financial Times, Big Tech Raises its Bets on Chips, notes that Google and Amazon have both developed custom chips for use in their clouds. It hypothesizes that the next generation of hardware will be one in which chip development is integrated more closely into a wider strategy.  More specifically, “the best hope of producing new leaps forward in speed and performance lies in the co-design of hardware, software and neural networks.” Co-design sounds like designing hardware that is highly optimized for running neural networks, designing neural networks that are a good match for that specific hardware, and designing programming languages and tools for that specific combination of hardware and neural network. Rather than taking place sequentially (hardware first, then programming tools, then application software), all of these activities take place concurrently, informing each other. That sounds like a turn away from general-purpose hardware, at least superficially: the resulting chips will be good at doing one thing extremely well. It’s also worth noting that, while there is a lot of interest in quantum computing, quantum computers will inevitably be specialized processors attached to conventional computers. There is no reason to believe that a quantum computer can (or should) run general purpose software such as software that renders video streams, or software that calculates spreadsheets. Quantum computers will be a big part of our future–but not in a general-purpose way. Both co-design and quantum computing step away from general-purpose computing hardware. We’ve come to the end of Moore’s Law, and can’t expect further speedups from hardware itself.  We can expect improved performance by optimizing our hardware for a specific task.

Co-design of hardware, software, and neural networks will inevitably bring a new generation of tools to software development. What will those tools be? Our current development environments don’t require programmers to know much (if anything) about the hardware. Assembly language programming is a specialty that’s really only important for embedded systems (and not all of them) and a few applications that require the utmost in performance. In the world of co-design, will programmers need to know more about hardware? Or will a new generation of tools abstract the hardware away, even as they weave the hardware and the software together even more intimately? I can certainly imagine tools with modules for different kinds of neural network architectures; they might know about the kind of data the processor is expected to deal with; they might even allow a kind of “pre-training”–something that could ultimately give you GPT-3 on a chip. (Well, maybe not on a chip. Maybe a few thousand chips designed for some distributed computing architecture.) Will it be possible for a programmer to say “This is the kind of neural network I want, and this is how I want to program it,” and let the tool do the rest? If that sounds like a pipe-dream, realize that tools like GitHub Copilot are already automating programming.

Chip design is the poster child for “the first unit costs 10 billion dollars; the rest are all a penny apiece.”  That has limited chip design to well-financed companies that are either in the business of selling chips (like Intel and AMD) or that have specialized needs and can buy in very large quantities themselves (like Amazon and Google). Is that where it will stop–increasing the imbalance of power between a few wealthy companies and everyone else–or will co-design eventually enable smaller companies (and maybe even individuals) to build custom processors? To me, co-design doesn’t make sense if it’s limited to the world’s Amazons and Googles. They can already design custom chips.  It’s expensive, but that expense is itself a moat that competitors will find hard to cross. Co-design is about improved performance, yes; but as I’ve said, it’s also inevitably about improved tools.  Will those tools result in better access to semiconductor fabrication facilities?

We’ve seen that kind of transition before. Designing and making printed circuit boards used to be hard. I tried it once in high school; it requires acids and chemicals you don’t want to deal with, and a hobbyist definitely can’t do it in volume. But now, it’s easy: you design a circuit with a free tool like Kicad or Fritzing, have the tool generate a board layout, send the layout to a vendor through a web interface, and a few days later, a package arrives with your circuit boards. If you want, you can have the vendor source the board’s components and solder them in place for you. It costs a few tens of dollars, not thousands. Can the same thing happen at the chip level? It hasn’t yet. We’ve thought that field-programmable gate arrays might eventually democratize chip design, and to a limited extent, they have. FPGAs aren’t hard for small- or mid-sized businesses that can afford a few hardware engineers, but they’re far from universal, and they definitely haven’t made it to hobbyists or individuals.  Furthermore, FPGAs are still standardized (generalized) components; they don’t democratize the semiconductor fabrication plant.

What would “cloud computing” look like in a co-designed world? Let’s say that a mid-sized company designs a chip that implements a specialized language model, perhaps something like O’Reilly Answers. Would they have to run this chip on their own hardware, in their own datacenter?  Or would they be able to ship these chips to Amazon or Google for installation in their AWS and GCP data centers?  That would require a lot of work standardizing the interface to the chip, but it’s not inconceivable.  As part of this evolution, the co-design software will probably end up running in someone’s cloud (much as AWS Sagemaker does today), and it will “know” how to build devices that run on the cloud provider’s infrastructure. The future of cloud computing might be running custom hardware.

We inevitably have to ask what this will mean for users: for those who will use the online services and physical devices that these technologies enable. We may be seeing that pendulum swing back towards specialized devices. A product like Sonos speakers is essentially a re-specialization of the device that was formerly a stereo system, then became a computer. And while I (once) lamented the idea that we’d eventually all wear jackets with innumerable pockets filled with different gadgets (iPods, i-Android-phones, Fitbits, Yubikeys, a collection of dongles and earpods, you name it), some of those products make sense:  I lament the loss of the iPod, as distinct from the general purpose phone. A tiny device that could carry a large library of music, and do nothing else, was (and would still be) a wonder.

But those re-specialized devices will also change. A Sonos speaker is more specialized than a laptop plugged into an amp via the headphone jack and playing an MP3; but don’t mistake it for a 1980s stereo, either. If inexpensive, high-performance AI becomes commonplace, we can expect a new generation of exceedingly smart devices. That means voice control that really works (maybe even for those who speak with an accent), locks that can identify people accurately regardless of skin color, and appliances that can diagnose themselves and call a repairman when they need to be fixed. (I’ve always wanted a furnace that could notify my service contractor when it breaks at 2AM.) Putting intelligence on a local device could improve privacy–the device wouldn’t need to send as much data back to the mothership for processing. (We’re already seeing this on Android phones.) We might get autonomous vehicles that communicate with each other to optimize traffic patterns. We might go beyond voice controlled devices to non-invasive brain control. (Elon Musk’s Neuralink has the right idea, but few people will want sensors surgically embedded in their brains.)

And finally, as I write this, I realize that I’m writing on a laptop–but I don’t want a better laptop. With enough intelligence, would it be possible to build environments that are aware of what I want to do? And offer me the right tools when I want them (possibly something like Bret Victor’s Dynamicland)? After all, we don’t really want computers.  We want “bicycles for the mind”–but in the end, Steve Jobs only gave us computers.

That’s a big vision that will require embedded AI throughout. It will require lots of very specialized AI processors that have been optimized for performance and power consumption. Creating those specialized processors will require re-thinking how we design chips. Will that be co-design, designing the neural network, the processor, and the software together, as a single piece? Possibly. It will require a new way of thinking about tools for programming–but if we can build the right kind of tooling, “possibly” will become a certainty.

Categories: Technology

Radar trends to watch: April 2022

O'Reilly Radar - Tue, 2022/04/05 - 04:32

March was a busy month, especially for developers working with GPT-3. After surprising everybody with its ability to write code, it’s not surprising that GPT-3 is appearing in other phases of software development. One group has written a tool that creates regular expressions from verbal descriptions; another tool generates Kubernetes configurations from verbal descriptions. In his newsletter, Andrew Ng talks about the future of low-code AI: it’s not about eliminating coding, but eliminating the need to write all the boilerplate. The latest developments with large language models like GPT-3 suggest that the future isn’t that distant.

On the other hand, the US copyright office has determined that works created by machines are not copyrightable. If software is increasingly written by tools like Copilot, what will this say about software licensing and copyright?

Artificial Intelligence
  • An unusual form of matter known as spin glass can potentially allow the implementation of neural network algorithms in hardware. One particular kind of network allows pattern matching based on partial patterns (for example, face recognition based on a partial face), something that is difficult or impossible with current techniques.
  • OpenAI has extended GPT-3 to do research on the web when it needs information that it doesn’t already have.
  • Data-centric AI is gaining steam, in part because Andrew Ng has been pushing it consistently. Data-centric AI claims that the best way to improve the AI is to improve the data, rather than the algorithms. It includes ideas like machine-generated training data and automatic tagging. Christoper Ré, at one of the last Strata conferences, noted that data collection was the part of AI that was most resistant to improvement.
  • We’ve seen that GPT-3 can generate code from English language comments. Can it generate Kubernetes configurations from natural language descriptions?  Take a look at AI Kube Bot.
  • The US Copyright Office has determined that works created by an artificial intelligence aren’t copyrightable; copyright requires human authorship. This is almost certainly not the final word on the topic.
  • A neural network with a single neuron that is used many times may be as effective as large neural networks, while using much less energy.
  • Training AI models on synthetic data created by a generative model can be more effective than using real-world data. Although there are pitfalls, there’s more control over bias, and the data can be made to include unexpected cases.
  • For the past 70 years, computing has been dominated by general-purpose hardware: machines designed to run any code. Even vector processors and their descendants (GPUs) are fundamentally general purpose. The next steps forward in AI may involve software, hardware, and neural networks that are designed for each other.
  • Ithaca is a DeepMind project that uses deep learning to recover missing texts in ancient Greek documents and inscriptions.  It’s particularly interesting as an example of human-machine collaboration. Humans can do this work with 25% accuracy, Ithaca is 62% accurate on its own, but Ithaca and humans combined reach 72% accuracy.
  • Michigan is starting to build the infrastructure needed to support autonomous vehicles: dedicated lanes, communications, digital signage, and more.
  • Polycoder is an open source code generator (like Copilot) that uses GPT-2, which is also open sourced. Developers claim that Polycoder is better than Copilot for many tasks, including programming in C. Because it is open-source, it enables researchers to investigate how these tools work, including testing for security vulnerabilities.
  • New approaches to molecule design using self-supervised learning on unlabeled data promise to make drug discovery faster and more efficient.
  • The title says it all. Converting English to Regular Expressions with GPT-3, implemented as a Google sheet. Given Copilot, it’s not surprising that this can be done.
  • Researchers at MIT have developed a technique for injecting fairness into a model itself, even after it has been trained on biased data.
  • Low code programming for Python: Some new libraries designed for use in Jupyter Notebooks (Bamboo, Lux, and Mito) allow a graphical (forms-based) approach to working with data using Python’s Pandas library.
  • Will the Linkerd service mesh displace Istio?  Linkerd seems to be simpler and more attractive to small and medium-sized organizations.
  • The biggest problem with Stack Overflow is the number of answers that are out of date.  There’s now a paper studying the frequency of out-of-date answers.
  • Silkworm-based encryption: Generating good random numbers is a difficult problem. One surprising new source of randomness is silk.  While silk appears smooth, it is (not surprisingly) very irregular at a microscopic scale.  Because of this irregularity, passing light through silk generates random diffraction patterns, which can be converted into random numbers.
  • The Hub for Biotechnology in the Built Environment (HBBE) is a research center that is rethinking buildings. They intend to create “living buildings” (and I do not think that is a metaphor) capable of processing waste and producing energy.
  • A change to the protein used in CRISPR to edit DNA reduces errors by a factor of 4000, without making the process slower.
  • Researchers have observed the process by which brains store sequences of memories.  In addition to therapies for memory disorders, this discovery could lead to advances in artificial intelligence, which don’t really have the ability to create and process timelines or narratives.
  • Object detection in 3D is a critical technology for augmented reality (to say nothing of autonomous vehicles), and it’s significantly more complex than in 2D. Facebook/Meta’s 3DETR uses transformers to build models from 3D data.
  • Some ideas about what Apple’s AR glasses, Apple Glass, might be. Take what you want… Omitting a camera is a good idea, though it’s unclear how you’d make AR work. This article suggests LIDAR, but that doesn’t sound feasible.
  • According to the creator of Pokemon Go, the metaverse should be about helping people to appreciate the physical world, not about isolating them in a virtual world.
  • Jeff Carr has been publishing (and writing about) dumps of Russian data obtained by hackers from GRUMO, the Ukraine’s cyber operations team.
  • Sigstore is a new kind of certificate authority (trusted root) that is addressing open source software supply chain security problems.  The goal is to make software signing “ubiquitous, free, easy, and transparent.”
  • Russia has created its own certificate authority to mitigate international sanctions. However, users of Chrome, Firefox, Safari, and other browsers originating outside of Russia would have to install the Russian root certificate manually to access Russian sites without warnings.
  • Corporate contact forms are replacing email as a vector for transmitting malware. BazarBackdoor [sic] is now believed to be under development by the Conti ransomware group.
  • Dirty Pipe is a newly discovered high-severity bug in the Linux kernel that allows any user to overwrite any file or obtain root privileges. Android phones are also vulnerable.
  • Twitter has created an onion service that is accessible through the Tor network. (Facebook has a similar service.)  This service makes Twitter accessible within Russia, despite government censorship.
  • The attackers attacked: A security researcher has acquired and leaked chat server logs from the Conti ransomware group. These logs include discussions of victims, Bitcoin addresses, and discussions of the group’s support of Russia.
  • Attackers can force Amazon Echo devices to hack themselves. Get the device to speak a command, and its microphone will hear the command and execute it. This misfeature includes controlling other devices (like smart locks) via the Echo.
  • The Anonymous hacktivist collective is organizing (to use that word very loosely) attacks against Russian digital assets. Among other things, they have leaked emails between the Russian defense ministry and their suppliers, and hacked the front pages of several Russian news agencies.
  • The Data Detox Kit is a quick guide to the bot world and the spread of misinformation.  Is it a bot or not?  This site has other good articles about how to recognize misinformation.
  • Sensor networks that are deployed like dandelion seeds! An extremely light, solar-powered framework for scattering of RF-connected sensors and letting breezes do the work lets researchers build networks with thousands of sensors easily. I’m concerned about cleanup afterwards, but this is a breakthrough, both in biomimicry and low-power hardware.
  • Semiconductor-based LIDAR could be the key to autonomous vehicles that are reasonably priced and safe. LIDAR systems with mechanically rotating lasers have been the basis for Google’s autonomous vehicles; they are effective, but expensive.
  • The open source instruction set architecture RISC-V is gaining momentum because it is enabling innovation at the lowest levels of hardware.
Quantum Computing
  • Microsoft claims to have made a breakthrough in creating topological qubits, which should be more stable and scalable than other approaches to quantum computing.
  • IBM’s quantum computer was used to simulate a time crystal, showing that current quantum computers can be used to investigate quantum processes, even if they can’t yet support useful computation.
  • Mozilla has published their vision for the future evolution of the web. The executive summary highlights safety, privacy, and performance. They also want to see a web on which it’s easier for individuals to publish content.
  • Twitter is expanding its crowdsourced fact-checking program (called Birdwatch). It’s not yet clear whether this has helped stop the spread of misinformation.
  • The Gender Pay Gap Bot (@PayGapApp) retweets corporate tweets about International Womens’ Day with a comment about the company’s gender pay gap (derived from a database in the UK).
  • Alex Russell writes about a unified theory of web performance.  The core principle is that the web is for humans. He emphasizes the importance of latency at the tail of the performance distribution; improvements there tend to help everyone.
  • WebGPU is a new API that gives web applications the ability to do rendering and computation on GPUs.
Blockchains and NFTs Business
Categories: Technology

AI Adoption in the Enterprise 2022

O'Reilly Radar - Thu, 2022/03/31 - 04:35

In December 2021 and January 2022, we asked recipients of our Data and AI Newsletters to participate in our annual survey on AI adoption. We were particularly interested in what, if anything, has changed since last year. Are companies farther along in AI adoption? Do they have working applications in production? Are they using tools like AutoML to generate models, and other tools to streamline AI deployment? We also wanted to get a sense of where AI is headed. The hype has clearly moved on to blockchains and NFTs. AI is in the news often enough, but the steady drumbeat of new advances and techniques has gotten a lot quieter.

Compared to last year, significantly fewer people responded. That’s probably a result of timing. This year’s survey ran during the holiday season (December 8, 2021, to January 19, 2022, though we received very few responses in the new year); last year’s ran from January 27, 2021, to February 12, 2021. Pandemic or not, holiday schedules no doubt limited the number of respondents.

Our results held a bigger surprise, though. The smaller number of respondents notwithstanding, the results were surprisingly similar to 2021. Furthermore, if you go back another year, the 2021 results were themselves surprisingly similar to 2020. Has that little changed in the application of AI to enterprise problems? Perhaps. We considered the possibility that the same individuals responded in both 2021 and 2022. That wouldn’t be surprising, since both surveys were publicized through our mailing lists—and some people like responding to surveys. But that wasn’t the case. At the end of the survey, we asked respondents for their email address. Among those who provided an address, there was only a 10% overlap between the two years.

When nothing changes, there’s room for concern: we certainly aren’t in an “up and to the right” space. But is that just an artifact of the hype cycle? After all, regardless of any technology’s long-term value or importance, it can only receive outsized media attention for a limited time. Or are there deeper issues gnawing at the foundations of AI adoption?

AI Adoption

We asked participants about the level of AI adoption in their organization. We structured the responses to that question differently from prior years, in which we offered four responses: not using AI, considering AI, evaluating AI, and having AI projects in production (which we called “mature”). This year we combined “evaluating AI” and “considering AI”; we thought that the difference between “evaluating” and “considering” was poorly defined at best, and if we didn’t know what it meant, our respondents didn’t either. We kept the question about projects in production, and we’ll use the words “in production” rather than “mature practice” to talk about this year’s results.

Despite the change in the question, the responses were surprisingly similar to last year’s. The same percentage of respondents said that their organizations had AI projects in production (26%). Significantly more said that they weren’t using AI: that went from 13% in 2021 to 31% in this year’s survey. It’s not clear what that shift means. It’s possible that it’s just a reaction to the change in the answers; perhaps respondents who were “considering” AI thought “considering really means that we’re not using it.” It’s also possible that AI is just becoming part of the toolkit, something developers use without thinking twice. Marketers use the term AI; software developers tend to say machine learning. To the customer, what’s important isn’t how the product works but what it does. There’s already a lot of AI embedded into products that we never think about.

From that standpoint, many companies with AI in production don’t have a single AI specialist or developer. Anyone using Google, Facebook, or Amazon (and, I presume, most of their competitors) for advertising is using AI. AI as a service includes AI packaged in ways that may not look at all like neural networks or deep learning. If you install a smart customer service product that uses GPT-3, you’ll never see a hyperparameter to tune—but you have deployed an AI application. We don’t expect respondents to say that they have “AI applications deployed” if their company has an advertising relationship with Google, but AI is there, and it’s real, even if it’s invisible.

Are those invisible applications the reason for the shift? Is AI disappearing into the walls, like our plumbing (and, for that matter, our computer networks)? We’ll have reason to think about that throughout this report.

Regardless, at least in some quarters, attitudes seem to be solidifying against AI, and that could be a sign that we’re approaching another “AI winter.” We don’t think so, given that the number of respondents who report AI in production is steady and up slightly. However, it is a sign that AI has passed to the next stage of the hype cycle. When expectations about what AI can deliver are at their peak, everyone says they’re doing it, whether or not they really are. And once you hit the trough, no one says they’re using it, even though they now are.

Figure 1. AI adoption and maturity

The trailing edge of the hype cycle has important consequences for the practice of AI. When it was in the news every day, AI didn’t really have to prove its value; it was enough to be interesting. But once the hype has died down, AI has to show its value in production, in real applications: it’s time for it to prove that it can deliver real business value, whether that’s cost savings, increased productivity, or more customers. That will no doubt require better tools for collaboration between AI systems and consumers, better methods for training AI models, and better governance for data and AI systems.

Adoption by Continent

When we looked at responses by geography, we didn’t see much change since last year. The greatest increase in the percentage of respondents with AI in production was in Oceania (from 18% to 31%), but that was a relatively small segment of the total number of respondents (only 3.5%)—and when there are few respondents, a small change in the numbers can produce a large change in the apparent percentages. For the other continents, the percentage of respondents with AI in production agreed within 2%.

Figure 2. AI adoption by continent

After Oceania, North America and Europe had the greatest percentages of respondents with AI in production (both 27%), followed by Asia and South America (24% and 22%, respectively). Africa had the smallest percentage of respondents with AI in production (13%) and the largest percentage of nonusers (42%). However, as with Oceania, the number of respondents from Africa was small, so it’s hard to put too much credence in these percentages. We continue to hear exciting stories about AI in Africa, many of which demonstrate creative thinking that is sadly lacking in the VC-frenzied markets of North America, Europe, and Asia.

Adoption by Industry

The distribution of respondents by industry was almost the same as last year. The largest percentages of respondents were from the computer hardware and financial services industries (both about 15%, though computer hardware had a slight edge), education (11%), and healthcare (9%). Many respondents reported their industry as “Other,” which was the third most common answer. Unfortunately, this vague category isn’t very helpful, since it featured industries ranging from academia to wholesale, and included some exotica like drones and surveillance—intriguing but hard to draw conclusions from based on one or two responses. (Besides, if you’re working on surveillance, are you really going to tell people?) There were well over 100 unique responses, many of which overlapped with the industry sectors that we listed.

We see a more interesting story when we look at the maturity of AI practices in these industries. The retail and financial services industries had the greatest percentages of respondents reporting AI applications in production (37% and 35%, respectively). These sectors also had the fewest respondents reporting that they weren’t using AI (26% and 22%). That makes a lot of intuitive sense: just about all retailers have established an online presence, and part of that presence is making product recommendations, a classic AI application. Most retailers using online advertising services rely heavily on AI, even if they don’t consider using a service like Google “AI in production.” AI is certainly there, and it’s driving revenue, whether or not they’re aware of it. Similarly, financial services companies were early adopters of AI: automated check reading was one of the first enterprise AI applications, dating to well before the current surge in AI interest.

Education and government were the two sectors with the fewest respondents reporting AI projects in production (9% for both). Both sectors had many respondents reporting that they were evaluating the use of AI (46% and 50%). These two sectors also had the largest percentage of respondents reporting that they weren’t using AI. These are industries where appropriate use of AI could be very important, but they’re also areas in which a lot of damage could be done by inappropriate AI systems. And, frankly, they’re both areas that are plagued by outdated IT infrastructure. Therefore, it’s not surprising that we see a lot of people evaluating AI—but also not surprising that relatively few projects have made it into production.

Figure 3. AI adoption by industry

As you’d expect, respondents from companies with AI in production reported that a larger portion of their IT budget was spent on AI than did respondents from companies that were evaluating or not using AI. 32% of respondents with AI in production reported that their companies spent over 21% of their IT budget on AI (18% reported that 11%–20% of the IT budget went to AI; 20% reported 6%–10%). Only 12% of respondents who were evaluating AI reported that their companies were spending over 21% of the IT budget on AI projects. Most of the respondents who were evaluating AI came from organizations that were spending under 5% of their IT budget on AI (31%); in most cases, “evaluating” means a relatively small commitment. (And remember that roughly half of all respondents were in the “evaluating” group.)

The big surprise was among respondents who reported that their companies weren’t using AI. You’d expect their IT expense to be zero, and indeed, over half of the respondents (53%) selected 0%–5%; we’ll assume that means 0. Another 28% checked “Not applicable,” also a reasonable response for a company that isn’t investing in AI. But a measurable number had other answers, including 2% (10 respondents) who indicated that their organizations were spending over 21% of their IT budgets on AI projects. 13% of the respondents not using AI indicated that their companies were spending 6%–10% on AI, and 4% of that group estimated AI expenses in the 11%–20% range. So even when our respondents report that their organizations aren’t using AI, we find that they’re doing something: experimenting, considering, or otherwise “kicking the tires.” Will these organizations move toward adoption in the coming years? That’s anyone’s guess, but AI may be penetrating organizations that are on the back side of the adoption curve (the so-called “late majority”).

Figure 4. Share of IT budgets allocated to AI

Now look at the graph showing the percentage of IT budget spent on AI by industry. Just eyeballing this graph shows that most companies are in the 0%–5% range. But it’s more interesting to look at what industries are, and aren’t, investing in AI. Computers and healthcare have the most respondents saying that over 21% of the budget is spent on AI. Government, telecommunications, manufacturing, and retail are the sectors where respondents report the smallest (0%–5%) expense on AI. We’re surprised at the number of respondents from retail who report low IT spending on AI, given that the retail sector also had a high percentage of practices with AI in production. We don’t have an explanation for this, aside from saying that any study is bound to expose some anomalies.

Figure 5. Share of IT budget allocated to AI, by industry Bottlenecks

We asked respondents what the biggest bottlenecks were to AI adoption. The answers were strikingly similar to last year’s. Taken together, respondents with AI in production and respondents who were evaluating AI say the biggest bottlenecks were lack of skilled people and lack of data or data quality issues (both at 20%), followed by finding appropriate use cases (16%).

Looking at “in production” and “evaluating” practices separately gives a more nuanced picture. Respondents whose organizations were evaluating AI were much more likely to say that company culture is a bottleneck, a challenge that Andrew Ng addressed in a recent issue of his newsletter. They were also more likely to see problems in identifying appropriate use cases. That’s not surprising: if you have AI in production, you’ve at least partially overcome problems with company culture, and you’ve found at least some use cases for which AI is appropriate.

Respondents with AI in production were significantly more likely to point to lack of data or data quality as an issue. We suspect this is the result of hard-won experience. Data always looks much better before you’ve tried to work with it. When you get your hands dirty, you see where the problems are. Finding those problems, and learning how to deal with them, is an important step toward developing a truly mature AI practice. These respondents were somewhat more likely to see problems with technical infrastructure—and again, understanding the problem of building the infrastructure needed to put AI into production comes with experience.

Respondents who are using AI (the “evaluating” and “in production” groups—that is, everyone who didn’t identify themselves as a “non-user”) were in agreement on the lack of skilled people. A shortage of trained data scientists has been predicted for years. In last year’s survey of AI adoption, we noted that we’ve finally seen this shortage come to pass, and we expect it to become more acute. This group of respondents were also in agreement about legal concerns. Only 7% of the respondents in each group listed this as the most important bottleneck, but it’s on respondents’ minds.

And nobody’s worrying very much about hyperparameter tuning.

Figure 6. Bottlenecks to AI adoption

Looking a bit further into the difficulty of hiring for AI, we found that respondents with AI in production saw the most significant skills gaps in these areas: ML modeling and data science (45%), data engineering (43%), and maintaining a set of business use cases (40%). We can rephrase these skills as core AI development, building data pipelines, and product management. Product management for AI, in particular, is an important and still relatively new specialization that requires understanding the specific requirements of AI systems.

AI Governance

Among respondents with AI products in production, the number of those whose organizations had a governance plan in place to oversee how projects are created, measured, and observed was roughly the same as those that didn’t (49% yes, 51% no). Among respondents who were evaluating AI, relatively few (only 22%) had a governance plan.

The large number of organizations lacking AI governance is disturbing. While it’s easy to assume that AI governance isn’t necessary if you’re only doing some experiments and proof-of-concept projects, that’s dangerous. At some point, your proof-of-concept is likely to turn into an actual product, and then your governance efforts will be playing catch-up. It’s even more dangerous when you’re relying on AI applications in production. Without formalizing some kind of AI governance, you’re less likely to know when models are becoming stale, when results are biased, or when data has been collected improperly.

Figure 7. Organizations with an AI governance plan in place

While we didn’t ask about AI governance in last year’s survey, and consequently can’t do year-over-year comparisons, we did ask respondents who had AI in production what risks they checked for. We saw almost no change. Some risks were up a percentage point or two and some were down, but the ordering remained the same. Unexpected outcomes remained the biggest risk (68%, down from 71%), followed closely by model interpretability and model degradation (both 61%). It’s worth noting that unexpected outcomes and model degradation are business issues. Interpretability, privacy (54%), fairness (51%), and safety (46%) are all human issues that may have a direct impact on individuals. While there may be AI applications where privacy and fairness aren’t issues (for example, an embedded system that decides whether the dishes in your dishwasher are clean), companies with AI practices clearly need to place a higher priority on the human impact of AI.

We’re also surprised to see that security remains close to the bottom of the list (42%, unchanged from last year). Security is finally being taken seriously by many businesses, just not for AI. Yet AI has many unique risks: data poisoning, malicious inputs that generate false predictions, reverse engineering models to expose private information, and many more among them. After last year’s many costly attacks against businesses and their data, there’s no excuse for being lax about cybersecurity. Unfortunately, it looks like AI practices are slow in catching up.

Figure 8. Risks checked by respondents with AI in production

Governance and risk-awareness are certainly issues we’ll watch in the future. If companies developing AI systems don’t put some kind of governance in place, they are risking their businesses. AI will be controlling you, with unpredictable results—results that increasingly include damage to your reputation and large legal judgments. The least of these risks is that governance will be imposed by legislation, and those who haven’t been practicing AI governance will need to catch up.


When we looked at the tools used by respondents working at companies with AI in production, our results were very similar to last year’s. TensorFlow and scikit-learn are the most widely used (both 63%), followed by PyTorch, Keras, and AWS SageMaker (50%, 40%, and 26%, respectively). All of these are within a few percentage points of last year’s numbers, typically a couple of percentage points lower. Respondents were allowed to select multiple entries; this year the average number of entries per respondent appeared to be lower, accounting for the drop in the percentages (though we’re unsure why respondents checked fewer entries).

There appears to be some consolidation in the tools marketplace. Although it’s great to root for the underdogs, the tools at the bottom of the list were also slightly down: AllenNLP (2.4%), BigDL (1.3%), and RISELab’s Ray (1.8%). Again, the shifts are small, but dropping by one percent when you’re only at 2% or 3% to start with could be significant—much more significant than scikit-learn’s drop from 65% to 63%. Or perhaps not; when you only have a 3% share of the respondents, small, random fluctuations can seem large.

Figure 9. Tools used by respondents with AI in production Automating ML

We took an additional look at tools for automatically generating models. These tools are commonly called “AutoML” (though that’s also a product name used by Google and Microsoft). They’ve been around for a few years; the company developing DataRobot, one of the oldest tools for automating machine learning, was founded in 2012. Although building models and programming aren’t the same thing, these tools are part of the “low code” movement. AutoML tools fill similar needs: allowing more people to work effectively with AI and eliminating the drudgery of doing hundreds (if not thousands) of experiments to tune a model.

Until now, the use of AutoML has been a relatively small part of the picture. This is one of the few areas where we see a significant difference between this year and last year. Last year 51% of the respondents with AI in production said they weren’t using AutoML tools. This year only 33% responded “None of the above” (and didn’t write in an alternate answer).

Respondents who were “evaluating” the use of AI appear to be less inclined to use AutoML tools (45% responded “None of the above”). However, there were some important exceptions. Respondents evaluating ML were more likely to use Azure AutoML than respondents with ML in production. This fits anecdotal reports that Microsoft Azure is the most popular cloud service for organizations that are just moving to the cloud. It’s also worth noting that the usage of Google Cloud AutoML and IBM AutoAI was similar for respondents who were evaluating AI and for those who had AI in production.

Figure 10. Use of AutoML tools Deploying and Monitoring AI

There also appeared to be an increase in the use of automated tools for deployment and monitoring among respondents with AI in production. “None of the above” was still the answer chosen by the largest percentage of respondents (35%), but it was down from 46% a year ago. The tools they were using were similar to last year’s: MLflow (26%), Kubeflow (21%), and TensorFlow Extended (TFX, 15%). Usage of MLflow and Kubeflow increased since 2021; TFX was down slightly. Amazon SageMaker (22%) and TorchServe (6%) were two new products with significant usage; SageMaker in particular is poised to become a market leader. We didn’t see meaningful year-over-year changes for Domino, Seldon, or Cortex, none of which had a significant market share among our respondents. (BentoML is new to our list.)

Figure 11. Tools used for deploying and monitoring AI

We saw similar results when we looked at automated tools for data versioning, model tuning, and experiment tracking. Again, we saw a significant reduction in the percentage of respondents who selected “None of the above,” though it was still the most common answer (40%, down from 51%). A significant number said they were using homegrown tools (24%, up from 21%). MLflow was the only tool we asked about that appeared to be winning the hearts and minds of our respondents, with 30% reporting that they used it. Everything else was under 10%. A healthy, competitive marketplace? Perhaps. There’s certainly a lot of room to grow, and we don’t believe that the problem of data and model versioning has been solved yet.

AI at a Crossroads

Now that we’ve looked at all the data, where is AI at the start of 2022, and where will it be a year from now? You could make a good argument that AI adoption has stalled. We don’t think that’s the case. Neither do venture capitalists; a study by the OECD, Venture Capital Investments in Artificial Intelligence, says that in 2020, 20% of all VC funds went to AI companies. We would bet that number is also unchanged in 2021. But what are we missing? Is enterprise AI stagnating?

Andrew Ng, in his newsletter The Batch, paints an optimistic picture. He points to Stanford’s AI Index Report for 2022, which says that private investment almost doubled between 2020 and 2021. He also points to the rise in regulation as evidence that AI is unavoidable: it’s an inevitable part of 21st century life. We agree that AI is everywhere, and in many places, it’s not even seen. As we’ve mentioned, businesses that are using third-party advertising services are almost certainly using AI, even if they never write a line of code. It’s embedded in the advertising application. Invisible AI—AI that has become part of the infrastructure—isn’t going away. In turn, that may mean that we’re thinking about AI deployment the wrong way. What’s important isn’t whether organizations have deployed AI on their own servers or on someone else’s. What we should really measure is whether organizations are using infrastructural AI that’s embedded in other systems that are provided as a service. AI as a service (including AI as part of another service) is an inevitable part of the future.

But not all AI is invisible; some is very visible. AI is being adopted in some ways that, until the past year, we’d have considered unimaginable. We’re all familiar with chatbots, and the idea that AI can give us better chatbots wasn’t a stretch. But GitHub’s Copilot was a shock: we didn’t expect AI to write software. We saw (and wrote about) the research leading up to Copilot but didn’t believe it would become a product so soon. What’s more shocking? We’ve heard that, for some programming languages, as much as 30% of new code is being suggested by the company’s AI programming tool Copilot. At first, many programmers thought that Copilot was no more than AI’s clever party trick. That’s clearly not the case. Copilot has become a useful tool in surprisingly little time, and with time, it will only get better.

Other applications of large language models—automated customer service, for example—are rolling out (our survey didn’t pay enough attention to them). It remains to be seen whether humans will feel any better about interacting with AI-driven customer service than they do with humans (or horrendously scripted bots). There’s an intriguing hint that AI systems are better at delivering bad news to humans. If we need to be told something we don’t want to hear, we’d prefer it come from a faceless machine.

We’re starting to see more adoption of automated tools for deployment, along with tools for data and model versioning. That’s a necessity; if AI is going to be deployed into production, you have to be able to deploy it effectively, and modern IT shops don’t look kindly on handcrafted artisanal processes.

There are many more places we expect to see AI deployed, both visible and invisible. Some of these applications are quite simple and low-tech. My four-year-old car displays the speed limit on the dashboard. There are any number of ways this could be done, but after some observation, it became clear that this was a simple computer vision application. (It would report incorrect speeds if a speed limit sign was defaced, and so on.) It’s probably not the fanciest neural network, but there’s no question we would have called this AI a few years ago. Where else? Thermostats, dishwashers, refrigerators, and other appliances? Smart refrigerators were a joke not long ago; now you can buy them.

We also see AI finding its way onto smaller and more limited devices. Cars and refrigerators have seemingly unlimited power and space to work with. But what about small devices like phones? Companies like Google have put a lot of effort into running AI directly on the phone, both doing work like voice recognition and text prediction and actually training models using techniques like federated learning—all without sending private data back to the mothership. Are companies that can’t afford to do AI research on Google’s scale benefiting from these developments? We don’t yet know. Probably not, but that could change in the next few years and would represent a big step forward in AI adoption.

On the other hand, while Ng is certainly right that demands to regulate AI are increasing, and those demands are probably a sign of AI’s ubiquity, they’re also a sign that the AI we’re getting is not the AI we want. We’re disappointed not to see more concern about ethics, fairness, transparency, and mitigating bias. If anything, interest in these areas has slipped slightly. When the biggest concern of AI developers is that their applications might give “unexpected results,” we’re not in a good place. If you only want expected results, you don’t need AI. (Yes, I’m being catty.) We’re concerned that only half of the respondents with AI in production report that AI governance is in place. And we’re horrified, frankly, not to see more concern about security. At least there hasn’t been a year-over-year decrease—but that’s a small consolation, given the events of last year.

AI is at a crossroads. We believe that AI will be a big part of our future. But will that be the future we want or the future we get because we didn’t pay attention to ethics, fairness, transparency, and mitigating bias? And will that future arrive in 5, 10, or 20 years? At the start of this report, we said that when AI was the darling of the technology press, it was enough to be interesting. Now it’s time for AI to get real, for AI practitioners to develop better ways to collaborate between AI and humans, to find ways to make work more rewarding and productive, to build tools that can get around the biases, stereotypes, and mythologies that plague human decision-making. Can AI succeed at that? If there’s another AI winter, it will be because people—real people, not virtual ones—don’t see AI generating real value that improves their lives. It will be because the world is rife with AI applications that they don’t trust. And if the AI community doesn’t take the steps needed to build trust and real human value, the temperature could get rather cold.

Categories: Technology

D-Day in Kyiv

O'Reilly Radar - Tue, 2022/03/22 - 11:02
My experience working with Ukraine’s Offensive Cyber Team

By Jeffrey Carr
March 22, 2022

When Russia invaded Ukraine on February 24th,  I had been working with two offensive cyber operators from GURMO—Main Intelligence Directorate of the Ministry of Defense of Ukraine—for several months trying to help them raise funds to expand development on an OSINT (Open Source Intelligence) platform they had invented and were using to identify and track Russian terrorists in the region. Since the technology was sensitive, we used Signal for voice and text calls. There was a lot of tension during the first few weeks of February due to Russia’s military buildup on Ukraine’s borders and the uncertainty of what Putin would do.

Then on February 24th at 6am in Kyiv (February 23, 8pm in Seattle where I live), it happened.

SIGNAL log 23 FEB 2022 20:00 (Seattle)  / 24 FEB 2022 06:00 (Kyiv)

Missed audio call - 8:00pm It started 8:01PM War? 9:36PM Incoming audio call - 9:37PM Call dropped. 9:41PM Are you there? 9:42PM

I didn’t hear from my GURMO friend again for 10 hours. When he pinged me on Signal, it was from a bunker. They were expecting another missile attack at any moment.

“Read this”, he said, and sent me this link. “Use Google Translate.”

It linked to an article that described Russia’s operations plan for its attack on Ukraine, obtained by sources of Ukrainian news website ZN.UA. It said that the Russian military had sabotage groups already placed in Ukraine whose job was to knock out power and communications in the first 24 hours in order to cause panic. Acts of arson and looting would follow, with the goal of distracting law enforcement from chasing down the saboteurs. Then, massive cyber attacks would take down government websites, including the Office of the President, the General Staff, the Cabinet, and the Parliament (the Verkhovna Rada). The Russian military expected little resistance when it moved against Kyiv and believed that it could capture the capital in a matter of days.

The desired result is to seize the leadership of the state (it is not specified who exactly) and force a peace agreement to be signed on Russian terms under blackmail and the possibility of the death of a large number of civilians.

Even if part of the country’s leadership is evacuated, some pro-Russian politicians will be able to “take responsibility” and sign documents, citing the “escape” of the political leadership from Kyiv.

As a result, Ukraine can be divided into two parts—on the principle of West and East Germany, or North and South Korea.

At the same time, the Russian Federation recognizes the legitimate part of Ukraine that will sign these agreements and will be loyal to the Russian Federation. Guided by the principle: “he who controls the capital—he controls the state.”

The first significant Russian cyber attack of the war is suspected to be the one that took down satellite provider ViaSat at precisely 06:00 Kyiv time (04:00 UTC), the exact time that Russia started its invasion.

The cause is believed to be a malicious firmware update sent to ViaSat customers that “bricked” the satellite modems. Since ViaSat is a defense contractor, the NSA, France’s ANSSI, and Ukrainian Intelligence are investigating. ViaSat hired Mandiant to handle digital forensics and incident response (DFIR).

“Is Ukraine planning to retaliate?”, I asked.

“We’re engaging in six hours. I’ll keep you informed.”

That last exchange happened about 22 hours after the start of the war.

FRIDAY, FEB 25, 2022 07:51

I received a Signal alert.

“Download ready” and a link.

The GURMO cyber team had gained access to the accounting and document management system at Russian Military Unit 6762, part of the Ministry of Internal Affairs that deals with riot control, terrorists, and the territorial defense of Russia. They downloaded all of their personnel data, including passports, military IDs, credit cards, and payment records. I was sent a sampling of documents to do further research and post via my channels.

The credit cards were all issued by Sberbank. “What are you going to do with these”, I asked. He sent me a wink and a grin icon on Signal and said:

Buy weapons and ammo for our troops! We start again at 6:30am tomorrow. When you wake up, join us. Will do!

Over the next few days, GURMO’s offensive cyber team hacked a dizzying array of Russian targets and stole thousands of files from:

  • Black Sea Fleet’s communications servers
  • FSB Special Operations unit 607
  • Sergey G. Buev, the Chief Missile Officer of the Ministry of Defense
  • Federal Air Transport Agency

Everything was in Russian, so the translation process was very time-consuming. There were literally hundreds of documents in all different file types, and to make the translation process even harder, many of the documents were images of a document. You can’t just upload those into Google Translate. You have to download the Google Translate app onto your mobile phone, then point it at the document on your screen and read it that way.

Once I had read enough, I could write a post at my Inside Cyber Warfare Substack that provided information and context to the breach. Between the translation, research, writing, and communication with GURMO ,who were 11 hours ahead (10 hours after the time change), I was getting about 4 ½ hours of sleep each night.

We Need Media Support

TUESDAY, MARCH 1, 2022 09:46 (Seattle)

On Signal

We need media support from USA. All the attacks you mentioned during these 6 days. We have to make headlines to demoralize Russians. I know the team at a young British PR firm. I’ll check with them now.

Nara Communications immediately stepped up to the challenge. They agreed to waive their fee and help place news stories about the GURMO cyber team’s successes. The Ukrainians did their part and gave them some amazing breaches, starting with the Beloyarsk Nuclear Power Plant—the world’s only commercial fast breeder reactors. Other countries were spending billions of dollars trying to achieve what Russia had already mastered, so a breach of their design documents and processes was a big deal.

The problem was that journalists wanted to speak to GURMO and that was off the table for three important reasons:

  1. They were too busy fighting a war to give interviews.
  2. The Russian government knew who they were, and their names and faces were on the playing cards given to Kadryov’s Chechen Guerillas for assassination.
  3. They didn’t want to expose themselves to facial recognition or voice capture technologies because…see #2.

Journalists had only a few options if they didn’t want to run with a single-source story.

They could speak with me because I was the only person who the GURMO team would directly speak to. Plus, I had possession of the documents and understood what they were.

They could contact the CIA Legat in Warsaw, Poland where the U.S. embassy had evacuated to prior to the start of the war. GURMO worked closely with and gave frequent briefings to its allied partners, and they would know about these breaches. Of course, the CIA most likely wouldn’t speak with a journalist.

They could speak with other experts to vet the documents, which would effectively be their second source after speaking with me. Most reporters at major outlets didn’t bother reporting these breaches under those conditions. To make matters worse, there were no obvious victims. The GURMO hackers weren’t breaking things, they were stealing things, and they liked to keep a persistent presence in the network so they could keep coming back for more. Plus, Russia often implemented a communications strategy known as Ихтамнет (Ihtamnet), which roughly translated means “nothing happened” or to put it into context “What hacks? There were no hacks.”

In spite of all those obstacles, Nara Communications was successful in getting an article placed with SC magazine, a radio interview with Britain’s The Times, and a podcast with the Evening Standard.

By mid-March, Putin showed no signs of wanting peace, even after President Zelensky had conceded that NATO membership was probably off the table for Ukraine, and GURMO was popping bigger targets than ever.

The Russians’ plan to establish a fully automated lunar base called Luna-Glob was breached. Russia’s EXOMars project was breached. The new launch complex being built at Vostochny for the Angara rocket was breached. In every instance, a trove of files was downloaded for study by Ukraine’s government and shared with its allies. A small amount was always carved out for me to review, post at the Inside Cyber Warfare Substack, and share with journalists. Journalist Joe Uchill referred to this strategy as Hack and Leak.

Hack and Leak

By hacking some of Russia’s proudest accomplishments (its space program) and most successful technologies (its nuclear research program), the Ukrainian government is sending Putin a message that your cybersecurity systems cannot keep us out, that even your most valuable technological secrets aren’t safe from us, and that if you push us too far, we can do whatever we want to your networks.

Apart from the attack on ViaSat, there hasn’t been evidence of any destructive cyber attacks against Ukrainian infrastructure. Part of that was strategic planning on the part of Ukraine (that’s all that I can say about that), part was Ukraine’s cyber defense at work, and part of that may be that GURMO’s strategy is working. However, there’s no sign that these leaks are having any effect on impeding Russia’s military escalation, probably because that’s driven out of desperation in the face of its enormous military losses so far. Should that escalation continue, GURMO has contingency plans that will bring the war home to Russia.

Jeffrey Carr has been an internationally-known cybersecurity adviser, author, and researcher since 2006. He has worked as a Russia SME for the CIA’s Open Source Center Eurasia Desk. He invented REDACT, the world’s first global R&D database and search engine to assist companies in identifying which intellectual property is of value to foreign governments. He is the founder and organizer of Suits & Spooks, a “collision” event to discuss hard challenges in the national security space, and is the author of Inside Cyber Warfare: Mapping the Cyber Underworld (O’Reilly Media, 2009, 2011). 

Categories: Technology

The Future of Security

O'Reilly Radar - Tue, 2022/03/15 - 07:02

The future of cybersecurity is being shaped by the need for companies to secure their networks, data, devices, and identities. This includes adopting security frameworks like zero trust, which will help companies secure internal information systems and data in the cloud. With the sheer volume of new threats, today’s security landscape has become more complex than ever. With the rise of ransomware, firms have become more aware of their ability to recover from an attack if they are targeted, but security needs also continue to evolve as new technologies, apps, and devices are developed faster than ever before. This means that organizations must be focused on solutions that allow them to stay on the cutting edge of technology and business.

What does the future have in store for cybersecurity? What are some of today’s trends, and what might be future trends in this area? Several significant cybersecurity trends have already emerged or will continue to gain momentum this coming year and beyond. This report covers four of the most important trends:

  • Zero trust (ZT) security (also known as context-aware security, policy-based enforcement), which is becoming more widespread and dominates many enterprise and vendor conversations.
  • Ransomware threats and attacks, which will continue to rise and wreak havoc.
  • Mobile device security, which is becoming more urgent with an increase in remote work and mobile devices.
  • Cloud security and automation, as a means for addressing cloud security issues and the workforce skills gap/ shortage of professionals.Related to this is cybersecurity as a service (CaaS or CSaaS) that will also gain momentum as companies turn to vendors who can provide extensive security infrastructure and support services at a fraction of the cost of building self-managed infrastructure.

We’ll start with zero trust, a critical element for any security program in this age of sophisticated and targeted cyberattacks.

Zero Trust Security

For decades, security architects have focused on perimeter protection, such as firewalls and other safety measures. However, as cloud computing increased, experts recognized that traditional strategies and solutions would not work in a mobile-first/hybrid world. User identities could no longer be confined to a company’s internal perimeter, and with employees needing access to business data and numerous SaaS applications while working remotely or on business travel, it became impossible to control access centrally.

The technology landscape is witnessing an emergence of security vendors rethinking the efficacy of their current security measures and offerings without businesses needing to rebuild entire architectures. One such approach is zero trust, which challenges perimeter network access controls by trusting no resources by default. Instead, zero trust redefines the network perimeter, treating all users and devices as inherently untrusted and likely compromised, regardless of their location within the network. Microsoft’s approach to zero trust security focuses on the contextual management of identities, devices, and applications—granting access based on the continual verification of identities, devices, and access to services.1


Zero trust security is a paradigm that leverages identity for access control and combines it with contextual data, continuous analysis, and automated response to ensure that the only network resources accessible to users and devices are those explicitly authorized for consumption.2

In Zero Trust Networks (O’Reilly, 2017), Evan Gilman and Doug Barth split a ZT network into five fundamental assertions:

  • The network is always assumed to be hostile.
  • External and internal threats exist on the web at all times.
  • Network locality is not sufficient for decided trust in a network.
  • Every device user and network flow is authenticated and authorized.
  • Policies must be dynamic and calculated from as many data sources as possible.3

Therefore, a zero trust architecture shifts from the traditional perimeter security model to a distributed, context-aware, and continuous policy enforcement model. In this model, requests for access to protected resources are first made through the control plane, where both the device and user must be continuously authenticated and authorized.

An identity first, contextual, and continual enforcement security approach will be especially critical for companies interested in implementing cloud services. Businesses will continue to focus on securing their identities, including device identities, to ensure that access control depends on context (user, device, location, and behavior) and policy-based rules to manage the expanding ecosystem of users and devices seeking access to corporate resources.

Enterprises that adopt a zero trust security model will more confidently allow access to their resources, minimize risks, and better mitigate cybersecurity attacks. IAM (identity and access management) is and will continue to be a critical component of a zero trust strategy.

The rise of cryptocurrency, the blockchain, and web3 technologies4 has also introduced conversations around decentralized identity and verifiable credentials.5 The decentralized identity model suggests that individuals own and control their data wherever or whenever used. This model will require identifiers such as usernames to be replaced with self-owned and independent IDs that enable data exchange using blockchain and distributed ledger technology to secure transactions. In this model, the thinking is that user data will no longer be centralized and, therefore, less vulnerable to attack.

By contrast, in the traditional identity model, where user identities are verified and managed by a third-party authority/identity provider (IdP), if an attacker gains access to the authority/IdP, they now have the keys to the kingdom, allowing full access to all identities.

Ransomware, an Emerging and Rapidly Evolving Threat

One of the most pressing security issues that businesses face today is ransomware. Ransomware is a type of malware that takes over systems and encrypts valuable company data requiring a ransom to be paid before the data is unlocked. The “decrypting and returning” that you pay for is, of course, not guaranteed; as such, ransomware costs are typically more than the costs of preparing for these attacks.

These types of attacks can be very costly for businesses, both in terms of the money they lose through ransomware and the potential damage to a company’s reputation. In addition, ransomware is a widespread method of attack because it works. As a result, the cybersecurity landscape will experience an increasing number of ransomware-related cybersecurity attacks estimated to cost businesses billions in damages.

So, how does it work? Cybercriminals utilize savvy social engineering tactics such as phishing, vishing, smishing, to gain access to a computer or device and launch a cryptovirus. The cryptovirus encrypts all files on the system, or multiple systems, accessible by that user. Then, the target (recipient) receives a message demanding payment for the decryption key needed to unlock their files. If the target (recipient) refuses to comply or fails to pay on time, the price of the decryption key increases exponentially, or the data is released and sold on the dark web. That is the simple case. With a growing criminal ecosystem, and subscription models like ransomware as a service (RaaS), we will continue to see compromised credentials swapped, sold, and exploited, and therefore, continued attacks across the globe.

Terms to Know

Phishing: a technique of fraudulently obtaining private information. Typically, the phisher sends an email that appears to come from a legitimate business—a bank or credit card company—requesting “verification” of information and warning of some dire consequence if it is not provided. The email usually contains a link to a fraudulent web page that seems legitimate—with company logos and content—and has a form requesting everything from a home address to an ATM card’s PIN or a credit card number.6

Smishing: the act of using SMS text messaging to lure victims into executing a specific action. For example, a text message claims to be from your bank or credit card company but includes a malicious link.

Vishing (voice phishing): a form of smishing except done via phone calls.

Cryptojacking: a type of cybercrime that involves unauthorized use of a device’s (computer, smartphone, tablet, server) computing power to mine or generate cryptocurrency.

Because people will trust an email from a person or organization that appears to be a trustworthy sender (e.g., you are more likely to trust an email that seems to be from a recognizable name/brand), these kinds of attacks are often successful.

As these incidents continue to be a daily occurrence, we’ve seen companies like Netflix and Amazon invest in cyber insurance and increase their cybersecurity budgets. However, on a more positive note, mitigating the risk of ransomware attacks has led companies to reassess their approach to protecting their organizations by shoring up defenses with more robust security protocols and advanced technologies. With companies storing exponentially more data than ever before, securing it has become critical.

The future of ransomware is expected to be one that will continue to grow in numbers and sophistication. These attacks are expected to impact even more companies, including targeted attacks focused on supply chains, industrial control systems, hospitals, and schools. As a result, we can expect that it will continue to be a significant threat to businesses.

Mobile Device Security

One of the most prominent areas of vulnerability for businesses today is through the use of mobile devices. According to Verizon’s Mobile Security Index 2020 Report,7 39% of businesses had a mobile-related breach in 2020. User threats, app threats, device threats, and network dangers were the top five mobile security threats identified in 2020, according to the survey. One example of a mobile application security threat can be an individual downloading apps that look legitimate but are actually spyware and malware aimed at stealing personal and business information.

Another potential problem involves employees accessing and storing sensitive data or emails on their mobile devices while traveling from one domain to another (for example, airport WiFi, coffee shop WiFi).

Security experts believe that mobile device security is still in its early stages, and many of the same guidelines used to secure traditional computers may not apply to modern mobile devices. While mobile device management (MDM) solutions are a great start, organizations will need to rethink how they handle mobile device security in enterprise environments. The future of mobile device management will also be dependent on contextual data and continuous policy enforcement.

With mobile technology and cloud computing becoming increasingly important to both business and consumer life, smart devices like Apple AirTags, smart locks, video doorbells, and so on are gaining more weight in the cybersecurity debate.

Security concerns range from compromised accounts to stolen devices, and as such, cybersecurity companies are offering new products to help consumers protect their smart homes.

A key issue involving the future of mobile device management is how enterprises can stay ahead of new security issues as they relate to bring your own device (BYOD) and consumer IoT (Internet of Things) devices. Security professionals may also need to reevaluate how to connect a growing number of smart devices in a business environment. Security has never been more important, and new trends will continue to emerge as we move through the future of BYOD and IoT.

Cloud Security and Automation

We have seen an increase in businesses moving their operations to the cloud to take advantage of its benefits, such as increased efficiency and scalability. As a result, the cloud is becoming an integral part of how organizations secure their data, with many companies shifting to a hybrid cloud model to address scale, security, legacy technologies, and architectural inefficiencies. However, staffing issues and the complexities of moving from on-premises to cloud/hybrid cloud introduces a new set of security concerns.

Cloud services are also often outsourced, and as such, it can be challenging to determine who is responsible for the security of the data. In addition, many businesses are unaware of the vulnerabilities that exist in their cloud infrastructure and, in many cases, do not have the needed staff to address these vulnerabilities. As a result, security will remain one of the biggest challenges for organizations adopting cloud computing.

One of the most significant benefits cloud computing can provide to security is automation. The need for security automation is rising as manual processes and limited information-sharing capabilities slow the evolution of secure implementations across many organizations. It is estimated that nearly half of all cybersecurity incidents are caused by human error, mitigated through automated security tools rather than manual processes.

However, there can be a downside to automation. The industry has not yet perfected the ability to sift signals from large amounts of noise. An excellent example is what happens around incident response and vulnerability management—both still rely on human intervention or an experienced automation/tooling expert. Industry tooling will need to improve in this area. While automation can also help reduce the impact of attacks, any automated solution runs the risk of being ineffective against unknown threats if human eyes do not assess it before it is put into practice.

In a DevOps environment, automation takes the place of human labor. The key for security will be code-based configuration, and the ability to be far more confident about the current state of existing security and infrastructure appliances. Organizations that have adopted configuration by code will also have higher confidence during audits—for example, an auditor checks each process for changing firewall rules, which already go through change control, then spot checks one out of thousands of rules versus validating the CI/CD pipeline. The auditor then runs checks on your configuration to confirm it meets policy.

The evolution of SOAR (security, orchestration, automation, and response) tools and automation of security policy by code will open up a huge potential benefit for well-audited businesses in the future.

Automation May Help with the Security Workforce Shortage

The shortage of cyber workers will persist because there aren’t enough cybersecurity professionals in the workforce, and cyber education isn’t keeping up with the demand at a solid pace. As a result, cybersecurity teams are understaffed and burnt-out, lowering their effectiveness while posing risks.

Automation may help organizations fill the cybersecurity talent gap and address many of the same activities that human employees perform, such as detection, response, and policy configuration.

While automation cannot completely replace the need for human cybersecurity experts, it can assist in decreasing the burden on these professionals and make them more successful in their work. In addition to more professionals joining the field with varying backgrounds, automated technologies will play a significant role in mitigating the impact of cyberattacks and assisting in solving the cybersecurity workforce shortage problem.

(Cyber)Security as a Service

Cybersecurity as a service (CaaS or CSaaS) is growing more popular as companies turn to managed service vendors that can provide extensive security infrastructure and support services at a fraction of the cost of building self-managed infrastructure. As a result, organizations can use their resources more effectively by outsourcing security needs to a specialized vendor rather than building in-house infrastructure.

CaaS provides managed security services, intrusion detection and prevention, and firewalls by a third-party vendor. By outsourcing cybersecurity functions to a specialist vendor, companies can access the security infrastructure support they need without investing in extensive on-site infrastructure, such as firewalls and intrusion detection systems (IDS).

There are additional benefits:

  • Access to the latest threat protection technologies.
  • Reduced costs: outsourced cybersecurity solutions can be less expensive than an in-house security team.
  • Improved internal resources: companies can focus on their core business functions by outsourcing security to a third party.
  • Flexibility: companies can scale their security needs as needed.

The ransomware attack on Hollywood Presbyterian Medical Center8 is an excellent example of why CaaS will continue to be sought after by organizations of all sizes. Cybercriminals locked the hospital’s computer systems and demanded a ransom payment to unlock them. As a result, the hospital was forced to turn to a cybersecurity vendor for help in restoring its computer systems.

Of course, this approach has disadvantages:

  • Loss of control over how data is stored and who has access to your data/infrastructure. Security tooling often needs to run at the highest levels of privilege, enabling attackers to attack enterprises at scale, use the managed service provider network to bypass security safeguards, or exploit software vulnerabilities like SolarWinds Log4j.
  • In addition, CaaS providers may or may not support existing legacy software or critical business infrastructure specific to each organization.

CaaS is expected to continue on a solid growth path as more enterprises rely on cloud-based systems and the IoT for their business operations.


Cyberattacks continue to be successful because they are effective. Thanks to cutting-edge technology, services, and techniques available to every attacker, organizations can no longer afford to make security an afterthought. To defend against present and future cyberattacks, businesses must develop a comprehensive security plan that incorporates automation, analytics, and context-aware capabilities. Now more than ever, companies must be more diligent about protecting their data, networks, and employees.

Whether businesses embrace identity-first and context-aware strategies like zero trust, or technologies like cloud computing, mobile devices, or cybersecurity as a service (CaaS), the growth of ransomware and other cyberattacks is forcing many companies to rethink their overall cybersecurity strategies. As a result, organizations will need to approach security holistically by including all aspects of their business operation and implementing in-depth defense strategies from the onset.

The future is bright for the cybersecurity industry, as companies will continue to develop new technologies to guard against the ever-evolving threat landscape. Government rules, regulations, and security procedures will also continue to evolve to keep up with emerging technologies and the rapid number of threats across both private and public sectors.


1. “Transitioning to Modern Access Architecture with Zero Trust”.

2. Scott Rose et al., NIST Special Publication 800-207.

3. Evan Gilman and Doug Barth, Zero Trust Networks (O’Reilly, 2017).

4. See “Decentralized Identity for Crypto Finance”.

5. See “Verifiable Credentials Data Model”.

6. See this social engineering article for more information.

7. “The State of Mobile Security”.

8. “Hollywood Hospital Pays $17,000 in Bitcoin to Hackers; FBI Investigating”.

Categories: Technology

Identity problems get bigger in the metaverse

O'Reilly Radar - Tue, 2022/03/15 - 07:01

If the hype surrounding the metaverse results in something real, it could improve the way you live, work, and play. Or it could create a hellworld where you don’t get to be who you are or want to be.  Whatever people think they’ve read, the metaverse originally imagined in Snow Crash is not a vision for an ideal future. In the novel, it’s a world that replaced the “real world” so that people would feel less bad about the reality they actually had. In the end, the story is about the destabilization of the individual’s identity and implosion of traditional identities, rather than the securing of a new one.

Even in the real world (a.k.a. meatspace), identity can be hard to pin down. You are who you are, but there are many ways you may define yourself depending on the context. In the latest metaverse discourse there has been lots of talk of virtual avatars putting on NFT-based clothing, skins, weapons, and other collectable assets, and then moving those assets around to different worlds and games without issue. Presentation is just a facet of identity, as the real-world fashion industry well knows.

The latest dreams of web3 include decentralized and self-sovereign identity. But this is just re-hashing years of identity work that focuses on the how (internet standards) and rarely the why (what people need to feel comfortable with identity online). Second Life has been grappling with how people construct a new identity and present their avatars since 2003.

There are many ways that the web today and the metaverse tomorrow will continue to integrate further with our reality:

ExperiencesExamples Online through a laptop like the web todayPosting to Facebook, discussing work on Slack or joining a DAO on Discord.Mobile devices while walking around in the real worldSeeing the comments about a restaurant while standing in front of it, getting directions to a beach or getting access to a private club via an NFT.Mixed and augmented reality (MR/AR) experiences where the digital is overlaid on realityChatting with someone who looks like they are sitting next to you or seeing the last message you sent to someone you are talking to.Fully immersive virtual reality (VR) experiencesGoing to a chat room in AltspaceVR or playing a game with friends in Beatsaber.

Before we can figure out what identity means to people in “the metaverse,” we need to talk about what identity is, how we use identity in the metaverse, and how we might create systems that better realize the way people want their identities to work online.

I login therefore I am

When I mention identity, am I starting a philosophical discussion that answers the question “who am I?” Am I trying to figure out my place within an in-person social event? Or do you want to confirm that I meet some standard, such as being over 21?

All of these questions have a meaning in the digital world; most often, those questions are answered by logging in with an email address and password to get into a particular website. Over the last decade, some services like Facebook, Google, and others have started to allow you to use the identity you have with them to log into other websites.

Is the goal of online identity to have one overarching identity that ties everything together? Our identities are constantly renegotiated and unsaid. I don’t believe we can encode all of the information about our identities into a single digital record, even if some groups are trying. Facebook’s real-name policy requires you to use your legal name and makes you collapse all of your possible pseudo-identities into your legal one. If they think you aren’t using a legal name, they require you to upload a government issued document. I’d argue that because people create multiple identities even when faced with account deactivation, it is not their goal to have one single compiled identity.

All of me(s)

As we consider identities in the metaverse extensions to the identities we have in the real world, we need to understand that we build pseudo-identities for different interactions. My pseudo-identities for a family, work, my neighborhood, PTA, school friends, etc. all overlap to some extent. These are situations, contexts, realms, or worlds that I am part of, and that extend to the web and metaverse.

In most pseudo-identities there are shared parts that are the “real me,” like my name or my real likeness. Some may be closer to a “core” pseudo-identity that represents more of what I consider to be me; others may just be smaller facets. Each identity is associated with a different reputation, a different level of trust from the community, and different data (profile pictures, posts, etc.).

The most likely place to find our identities are:

  • Lists of email and password pairs stored in our browsers
  • Number of groups we are part of on Facebook
  • Gamer tags we have on Oculus, Steam, or PSN
  • Discords we chat on
  • …and the list goes on

Huge numbers of these identities are being created and managed by hand today. On average, a person has 1.75 email addresses and manages 90 online accounts. It will only get more complex and stranger with the addition of the metaverse.

There are times that I don’t want my pseudo-identity’s reputations or information to interact with a particular context; for these cases, I’ll create a pseudo-anonymous identity. There is a lot of prior work on anonymity as a benefit:

  • Balaji Srinivasan has discussed the value of an economy based on pseudonymous identities as a way to “air gap” against repercussions of social problems.
  • Jeff Kosseff, professor and author, has recently written a book about the benefits of anonymity “The United States of Anonymous.” In a great discussion on the TechDirt podcast he talks about how the ability to question powers is an important aspect of the ability to be anonymous.
  • Christopher “moot” Poole, the creator of 4chan, has often talked about the benefits of anonymous online identities including the ability to be more creative without the risk of failure. Given the large amount of harmful abuse that comes out of communities like 4chan, this argument for anonymity is questionable.
My many identities and overlapping zones of attributes, information, and privacy.

If you link one of my pseudo-identities to another pseudo-identity in a way I didn’t expect, it can feel like a violation. I expect to control the flow of information about me (see Helen Nissenbaum’s work on contextual integrity for insight into a beneficial privacy framework). I don’t want my online poker group’s standing to be shown to the PTA, with which I discuss school programs. Teachers who have OnlyFans accounts have been fired when the accounts are discovered. Journalists reporting on cartel activities have been killed. Twitter personalities that use their real names can be doxed by someone who links their Twitter profile to a street address and mobile phone number. This can have horrible consequences.

In the real world, we have many of these pseudo-identities and pseudo-anonymous identities. We even have an expectation of anonymity in groups like Alcoholics Anonymous and private clubs. If we look to Second Life, some people would adopt core pseudo-identities and others pseudo-anonymous identities.

In the online world and, eventually, the metaverse, we will have more control over the use of our identities and pseudo-identities, but possibly less ability to understand how these identities are being handled by each system we are part of. Our identities can already collide in personal devices (for example, my mobile phone) and communal devices (for example, the voice assistant in my kitchen around my family).

How do you recognize someone in the metaverse?

In the real world we recognize people by their face, and identify them by a name in our heads (if you are good at that sort of thing). We may remember the faces of some people we pass on the street, but in a city, we don’t really know most of the people who we are around.

A few of the author’s identities online and in the metaverse.

The person you’re communicating with may show up with a real name, a nickname, or even a pseudo-anonymous name. Their picture might be a professional photo, a candid picture, or an anime avatar, or some immersive presentation. All of these identifiers are protected by login, multi-factor authentication, or other mechanisms–yet people are hacked all the time. A site like Facebook tries to give you assurances that you are interacting with the person you think you’re interacting with; this is one justification for their real-name policy. Still, there is a difference between the logical “this is this person because Facebook says so” and the emotional “this feels like the person because my senses say so.” With improvements in immersion and building “social presence” (a theory of “sense of being with another”), we may be tricked more easily into providing better engagement metrics for a social media site. I may even feel that AI-generated faces based on people I know are more trustworthy than actual images of the people themselves.

What if you could give your online avatar your voice, and even make it use idioms you use? This type of personal spoofing may not always be nefarious. You might just want a bot that could handle low value conversations, say with a telemarketer or bill collector.

We can do better than “who can see this post”

To help people grapple with the increased complexity of identity in the metaverse, we need to rethink the way we create, manage, and eventually retire our identities. It goes way beyond just choosing what clothing to wear on a virtual body.

When you start to add technologies that tie everything you do to a public, immutable record, you may find that something you wish could be forgotten is remembered. What should be “on the chain” and how should you decide? Codifying aspects of our reputation is a dream of web3. The creation of digitally legible reputation can cause ephemeral and unsaid aspects of our identities to be stored forever. And an immutable public record of reputation data will no doubt conflict with legislation such as GDPR or CCPA.

The solutions to these problems are neither simple nor available today. To move in the right direction we should consider the following key principles when reconsidering how identities work in the metaverse so that we don’t end up with a dystopia:

  1. I want to control the flow of information rather than simply mark it as public or private: Contextual Integrity argues that the difference between “public” and “private” information hides the real issue, which is how information flows and where it is used.
  2. I want to take time to make sure my profile is right: Many development teams worry about adding friction to the signup process; they want to get new users hooked as soon as possible. But it’s also important to make sure that new users get their profile right. It’s not an inherently bad idea to slow down the creation and curation of a profile, especially if it is one the user will be associated with  for a long time. Teams that worry about friction have never seen someone spend an hour tweaking their character’s appearance in a video game.
  3. I want to experiment with new identities rather than commit up front: When someone starts out with a new service, they don’t know how they want to represent themselves. They might want to start with a blank avatar. On the other hand, the metaverse is so visually immersive that people who have been there for a while will have impressive avatars, and new people will stick out.
  4. I’m in control of the way my profiles interact: When I don’t want profiles not to overlap, there is usually a good reason. Services that assume we want everything to go through the same identity are making a mistake.  We should trust that the user is making a good choice.
  5. I can use language I understand to control my identities: Creating names is creating meanings. If I want to use something simple like “my school friends,” rather than a specific school name, I should be able to do so. That freedom of choice allows the user to supply the name’s meaning, rather than having it imposed from the outside.
  6. I don’t want shadow profiles created about me: A service violates my expectations of privacy when it links together various identities. Advertising platforms are already doing this through browser fingerprinting. It gets even worse when you start to use biometric and behavioral data, as Kent Bye from the Voices of VR podcast has warned. Unfortunately, users may never have control over these linkages; it may require regulation to correct.
  7. I should be warned when there are effects I might not understand due to multiple layers interacting: I should get real examples from my context to help me understand these interactions. It is the service developer’s job to help users avoid mistakes.

Social media sites like Facebook have tried to address some of these principles. For example, Facebook’s access controls for posts allow for “public,” “friends,” “friends except…,” “specific friends,” “only me,” and “custom.” These settings are further modified by the Facebook profile privacy control settings. It often (perhaps usually) isn’t clear what is actually happening and why, nor is it clear who will or won’t be able to see a post. This confusion is a recipe for violating social norms and privacy expectations.

Next, how do we allow for interaction? This isn’t as simple as creating circles of friends (an approach that Google+ tried). How do we visualize the various identities we currently have? More user research needs to go into how people would understand these constructions of identity on a web or virtual experience. My hunch is that they need to align some identities together (like family and PTA), and to separate out others (like gamertags). I don’t think requiring users to maintain a large set of access control lists (ACLs) is the right way to control interaction between identities.

The life of my identity

Finally, identities have life cycles. Some exist for a long time once established, like my family, but others may be short lived. I might try out participation in a community, and then find it isn’t for me. There are five key steps in the lifecycle of an identity:

  1. Create a new identity – this happens when I log into a new service or world. The new identity will need to be aligned with or separated from other identities.
  2. Share some piece of information with an identity – every meaningful identity is attached to data: common profile photos, purchased clothing, facial characteristics, voices, etc.
  3. Recover after being compromised – “oops I was hacked” will happen. What do people need to do to clean this up?
  4. Losing and recovering – if I lose the key to access this identity, is there a way I can get it back?
  5. Delete or close an identity, for now – people walk away from groups all the time. Usually they will just drift off or ghost; there should be a better way.

All services that plan on operating in the metaverse will need to consider these different stages. If you don’t, you will create systems that fail in ways that expose people to harm.

Allow for the multiplicity of a person in the metaverse

If you don’t think about the requirements of people, their identities, and the lifecycle of new identities, you will build services that don’t match your users’ expectations, in particular, their expectations of privacy.

Identity in the metaverse is more than a costume that you put on. It will consist of all the identities, pseudo-identities, and pseudo-anonymous identities we take on today, but displayed in a way that can fool us. We can’t forget that we are humans experiencing a reality that speaks to the many facets we have inside ourselves.

If all of us don’t take action, a real dystopia will be created that keeps people from being who they really are. As you grow and change, you will be weighed down by who you might have been at one point or who some corporation assumed you were. You can do better by building metaverse systems that embrace the multiple identities people have in real life.

If you lose your identity in your metaverse, you lose yourself for real.

Categories: Technology

Recommendations for all of us

O'Reilly Radar - Thu, 2022/03/10 - 07:07

If you live in a household with a communal device like an Amazon Echo or Google Home Hub, you probably use it to play music. If you live with other people, you may find that over time, the Spotify or Pandora algorithm seems not to know you as well. You’ll find songs creeping into your playlists that you would never have chosen for yourself.  The cause is often obvious: I’d see a whole playlist devoted to Disney musicals or Minecraft fan songs. I don’t listen to this music, but my children do, using the shared device in the kitchen. And that shared device only knows about a single user, and that user happens to be me.

More recently, many people who had end-of-year wrap up playlists created by Spotify found that they didn’t quite fit, including myself:


This kind of a mismatch and narrowing to one person is an identity issue that I’ve identified in previous articles about communal computing.  Most home computing devices don’t understand all of the identities (and pseudo-identities) of the people who are using the devices. The services then extend the behavior collected through these shared experiences to recommend music for personal use. In short, these devices are communal devices: they’re designed to be used by groups of people, and aren’t dedicated to an individual. But they are still based on a single-user model, in which the device is associated with (and collects data about) a single identity.

These services should be able to do a better job of recommending content for groups of people. Platforms like Netflix and Spotify have tried to deal with this problem, but it is difficult. I’d like to take you through some of the basics for group recommendation services, what is being tried today, and where we should go in the future.

Common group recommendation methods

After seeing these problems with communal identities, I became curious about how other people have solved group recommendation services so far. Recommendation services for individuals succeed if they lead to further engagement. Engagement may take different forms, based on the service type:

  • Video recommendations – watching an entire show or movie, subscribing to the channel, watching the next episode
  • Commerce recommendations – buying the item, rating it
  • Music recommendations – listening to a song fully, adding to a playlist, liking

Collaborative filtering (deep dive in Programming Collective Intelligence) is the most common approach for doing individual recommendations. It looks at who I overlap with in taste and then recommends items that I might not have tried from other people’s lists. This won’t work for group recommendations because in a group, you can’t tell which behavior (e.g., listening or liking a song) should be attributed to which person. Collaborative filtering only works when the behaviors can all be attributed to a single person.

Group recommendation services build on top of these individualized concepts. The most common approach is to look at each individual’s preferences and combine them in some way for the group. Two key papers discussing how to combine individual preferences describe PolyLens, a movie recommendation service for groups, and CATS, an approach to collaborative filtering for group recommendations. A paper on ResearchGate summarized research on group recommendations back in 2007.

According to the PolyLens paper, group recommendation services should “create a ‘pseudo-user’ that represents the group’s tastes, and to produce recommendations for the pseudo-user.” There could be issues about imbalances of data if some members of the group provide more behavior or preference information than others. You don’t want the group’s preferences to be dominated by a very active minority.

An alternative to this, again from the PolyLens paper, is to “generate recommendation lists for each group member and merge the lists.” It’s easier for these services to explain why any item is on the list, because it’s possible to show how many members of the group liked a particular item that was recommended. Creating a single pseudo-user for the group might obscure the preferences of individual members.

The criteria for the success of a group recommendation service are similar to the criteria for the success of individual recommendation services: are songs and movies played in their entirety? Are they added to playlists? However, group recommendations must also take into account group dynamics. Is the algorithm fair to all members of the group, or do a few members dominate its recommendations? Do its recommendations cause “misery” to some group members (i.e., are there some recommendations that most members always listen to and like, but that some always skip and strongly dislike)?

There are some important questions left for implementers:

  1. How do people join a group?
  2. Should each individual’s history be private?
  3. How do issues like privacy impact explainability?
  4. Is the current use to discover something new or to revisit something that people have liked previously (e.g. find out about a new movie that no one has watched or rewatch a movie the whole family has seen together since it is easy)?

So far, there is a lot left to understand about group recommendation services. Let’s talk about a few key cases for Netflix, Spotify, and Amazon first.

Netflix avoiding the issue with profiles, or is it?

Back when Netflix was primarily a DVD service (2004), they launched profiles to allow different people in the same household to have different queues of DVDs in the same account. Netflix eventually extended this practice to online streaming. In 2014, they launched profiles on their streaming service, which asked the question “who’s watching?” on the launch screen. While multiple queues for DVDs and streaming profiles try to address similar problems they don’t end up solving group recommendations. In particular, streaming profiles per person leads to two key problems:

  • When a group wants to watch a movie together, one of the group’s profiles needs to be selected. If there are children present, a kids’ profile will probably be selected.  However, that profile doesn’t take into account the preferences of adults who are present.
  • When someone is visiting the house, say a guest or a babysitter, they will most likely end up choosing a random profile. This means that the visitor’s behavioral data will be added to some household member’s profile, which could skew their recommendations.

How could Netflix provide better selection and recommendation streams when there are multiple people watching together? Netflix talked about this question in a blog post from 2012, but it isn’t clear to customers what they are doing:

That is why when you see your Top10, you are likely to discover items for dad, mom, the kids, or the whole family. Even for a single person household we want to appeal to your range of interests and moods. To achieve this, in many parts of our system we are not only optimizing for accuracy, but also for diversity.

Netflix was early to consider the various people using their services in a household, but they have to go further before meeting the requirements of communal use. If diversity is rewarded, how do they know it is working for everyone “in the room” even though they don’t collect that data? As you expand who might be watching, how would they know when a show or movie is inappropriate for the audience?

Amazon merges everyone into the main account

When people live together in a household, it is common for one person to arrange most of the repairs or purchases. When using Amazon, that person will effectively get recommendations for the entire household. Amazon focuses on increasing the number of purchases made by that person, without understanding anything about the larger group. They will offer subscriptions to items that might be consumed by a whole household, but mistaking those for the purchases of an individual.

The result is that the person who wanted the item will never see additional recommendations they may have liked if they aren’t the main account holder–and the main account holder might ignore those recommendations because they don’t care. I wonder if Amazon changes recommendations to individual accounts that are part of the same Prime membership; this might address some of this mismatch.

The way that Amazon ties these accounts together is still subject to key questions that will help create the right recommendations for a household. How might Amazon understand that purchases such as food and other perishables are for the household, rather than an individual? What about purchases that are gifts for others in the household?

Spotify is leading the charge with group playlists

Spotify has created group subscription packages called Duo (for couples) and Premium Family (for more than two people). These packages not only simplify the billing relationship with Spotify; they also provide playlists that consider everyone in the subscription.

The shared playlist is the union of the accounts on the same subscription. This creates a playlist of up to 50 songs that all accounts can see and play. There are some controls that allow account owners to flag songs that might not be appropriate for everyone on the subscription. Spotify provides a lot of information about how they construct the Blend playlist in a recent blog post. In particular, they weighed whether they should try to reduce misery or maximize joy:

“Minimize the misery” is valuing democratic and coherent attributes over relevance. “Maximize the joy” values relevance over democratic and coherent attributes. Our solution is more about maximizing the joy, where we try to select the songs that are most personally relevant to a user. This decision was made based on feedback from employees and our data curation team.

Reducing misery would most likely provide better background music (music that is not unpleasant to everyone in the group), but is less likely to help people discover new music from each other.

Spotify was also concerned about explainability: they thought people would want to know why a song was included in a blended playlist. They solved this problem, at least partly, by showing the picture of the person from whose playlists the song came.

These multi-person subscriptions and group playlists solve some problems, but they still struggle to answer certain questions we should ask about group recommendation services. What happens if two people have very little overlapping interest? How do we detect when someone hates certain music but is just OK with others? How do they discover new music together?

Reconsidering the communal experience based on norms

Most of the research into group recommendation services has been tweaking how people implicitly and explicitly rate items to be combined into a shared feed. These methods haven’t considered how people might self-select into a household or join a community that wants to have group recommendations.

For example, deciding what to watch on a TV may take a few steps:

  1. Who is in the room? Only adults or kids too? If there are kids present, there should be restrictions based on age.
  2. What time of day is it? Are we taking a midday break or relaxing after a hard day? We may opt for educational shows for kids during the day and comedy for adults at night.
  3. Did we just watch something from which an algorithm can infer what we want to watch next? This will lead to the next episode in a series.
  4. Who hasn’t gotten a turn to watch something yet? Is there anyone in the household whose highest-rated songs haven’t been played? This will lead to turn taking.
  5. And more…

As you can see, there are contexts, norms, and history are all tied up in the way people decide what to watch next as a group. PolyLens discussed this in their paper, but didn’t act on it:

The social value functions for group recommendations can vary substantially. Group happiness may be the average happiness of the members, the happiness of the most happy member, or the happiness of the least happy member (i.e., we’re all miserable if one of us is unhappy). Other factors can be included. A social value function could weigh the opinion of expert members more highly, or could strive for long-term fairness by giving greater weight to people who “lost out” in previous recommendations.

Getting this highly contextual information is very hard. It may not be possible to collect much more than “who is watching” as Netflix does today. If that is the case, we may want to reverse all of the context to the location and time. The TV room at night will have a different behavioral history than the kitchen on a Sunday morning.

One way to consider the success of a group recommendation service is how much browsing is required before a decision is made? If we can get someone watching or listening to something with less negotiation, that could mean the group recommendation service is doing its job.

With the proliferation of personal devices, people can be present to “watch” with everyone else but not be actively viewing. They could be playing a game, messaging with someone else, or simply watching something else on their device. This flexibility raises the question of what “watching together” means, but also lowers the concern that we need to get group recommendations right all the time.  It’s easy enough for someone to do something else. However, the reverse isn’t true.  The biggest mistake we can make is to take highly contextual behavior gathered from a shared environment and apply it to my personal recommendations.

Contextual integrity and privacy of my behavior

When we start mixing information from multiple people in a group, it’s possible that some will feel that their privacy has been violated. Using some of the framework of Contextual Integrity, we need to look at the norms that people expect. Some people might be embarrassed if the music they enjoy privately was suddenly shown to everyone in a group or household. Is it OK to share explicit music with the household even if everyone is OK with explicit music in general?

People already build very complex mental models about how services like Spotify work and sometimes personify them as “folk theories.” The expectations will most likely change if group recommendation services are brought front and center. Services like Spotify will appear to be more like a social network if they don’t bury who is currently logged into a small profile picture in the corner;  they should show everyone who is being considered for the group recommendations at that moment.

Privacy laws and regulations are becoming more patchwork not only worldwide (China has recently created regulation of content recommendation services) but even within states of the US. Collecting any data without appropriate disclosure and permission may be problematic. The fuel of recommendation services, including group recommendation services, is behavioral data about people that will fall under these laws and regulations. You should be considering what is best for the household over what is best for your organization.

The dream of the whole family

Today there are various efforts for improving recommendations to people living in households.  These efforts miss the mark by not considering all of the people who could be watching, listening, or consuming the goods. This means that people do not get what they really want, and that companies get less engagement or sales than they would like.

The key to fixing these issues is to do a better job of understanding who is in the room, rather than making assumptions that reduce all the group members down to a single account. To do so will require user experience changes that bring the household community front and center.

If you are considering how you build these services, start with the expectations of the people in the environment, rather than forcing the single user model on people. When you do, you will provide something great for everyone who is in the room: a way to enjoy something together.

Categories: Technology

Epstein Barr and the Cause of Cause

O'Reilly Radar - Tue, 2022/03/08 - 05:17

One of the most intriguing news stories of the new year claimed that the Epstein-Barr virus (EBV) is the “cause” of Multiple Sclerosis (MS), and suggested that antiviral medications or vaccinations for Epstein-Barr could eliminate MS.

I am not an MD or an epidemiologist. But I do think this article forces us to think about the meaning of “cause.” Although Epstein-Barr isn’t a familiar name, it’s extremely common; a good estimate is that 95% of the population is infected with it. It’s a variant of Herpes; if you’ve ever had mononucleosis, you’ve had it; most infections are asymptomatic. We hear much more about MS; I’ve had friends who have died from it. But MS is much less common: about 0.036% of the population has it (35.9 per 100,000).

We know that causation isn’t a one-size-fits-all thing: if X happens, then Y always happens. Lots of people smoke; we know that smoking causes lung cancer; but many people who smoke don’t get lung cancer. We’re fine with that; the causal connection has been painstakingly documented in great detail, in part because the tobacco industry went to such great lengths to spread misinformation.

But what does it mean to say that a virus that infects almost everyone causes a disease that affects very few people? The researchers appear to have done their job well. They studied 10 million people in the US military. 5 percent of those were negative for Epstein-Barr at the start of their service. 955 of that group were eventually diagnosed with MS, and had been infected with EBV prior to their MS diagnosis, indicating a risk factor 32 times higher than for those without EBV.

It is certainly fair to say that Epstein-Barr is implicated in MS, or that it contributes to MS, or some other phrase (that could not unreasonably be called “weasel words”). Is there another trigger that only has an effect when EBV is already present? Or is EBV the sole cause of MS, a cause that just doesn’t take effect in the vast majority of people?

This is where we have to think very carefully about causality, because as important as this research is, it seems like something is missing. An omitted variable, perhaps a genetic predisposition? Some other triggering condition, perhaps environmental? Cigarettes were clearly a “smoking gun”:  10 to 20 percent of smokers develop lung cancer (to say nothing of other diseases). EBV may also be a smoking gun, but one that only goes off rarely.

If there are no other factors, we’re justified in using the word “causes.” But it’s hardly satisfying—and that’s where the more precise language of causal inference runs afoul of human language. Mathematical language is more useful: Perhaps EBV is “necessary” for MS (i.e., EBV is required; you can’t get MS without it), but clearly not “sufficient” (EBV doesn’t necessarily lead to MS). Although once again, the precision of mathematics may be too much.

Biological systems aren’t necessarily mathematical, and it is possible that there is no “sufficient” condition; EBV just leads to MS in an extraordinarily small number of instances. In turn, we have to take this into account in decision-making. Does it make sense to develop a vaccine against a rare (albeit tragic, disabling, and inevitably fatal) disease? If EBV is implicated in other diseases, possibly. However, vaccines aren’t without risk (or expense), and even though the risk is very small (as it is for all the vaccines we use today), it’s not clear that it makes sense to take that risk for a disease that very few people get. How do you trade off a small risk against a very small reward? Given the anti-vax hysteria around COVID, requiring children to be vaccinated for a rare disease might not be poor public health policy; it might be the end of public health policy.

More generally: how do you build software systems that predict rare events? This is another version of the same problem—and unfortunately, the policy decision we are least likely to make is not to create such software. The abuse of such systems is a clear and present danger: for example, AI systems that pretend to predict “criminal behavior” on the basis of everything from crime data to facial images, are already being developed. Many are already in use, and in high demand from law enforcement agencies. They will certainly generate far more false positives than true positives, stigmatizing thousands (if not millions) of people in the process. Even with carefully collected, unbiased data (which doesn’t exist), and assuming some kind of causal connection between past history, physical appearance, and future criminal behavior (as in the discredited 19th century pseudoscience of physiognomy), it is very difficult, if not impossible, to reason from a relatively common cause to a very rare effect. Most people don’t become criminals, regardless of their physical appearance. Deciding a priori who will can only become an exercise in applied racism and bias.

Virology aside, the Epstein-Barr virus has one thing to teach us. How do we think about a cause that rarely causes anything? That is a question we need to answer.

Categories: Technology
Subscribe to LuftHans aggregator