You are here

Feed aggregator

Topic for metting on 9/9

PLUG - Wed, 2021/09/08 - 11:01

This month we'll have "Thank You" from der.hans.
Attend the meeting on by visiting: https://lufthans.bigbluemeeting.com/b/plu-yuk-7xx on the 8th of July at 7pm MST

Description
Thank you to all the developers, translators, documentors, artists and maintainers who make Free Software.
Thank you to all the project managers, mailing list admins, SREs and toolsmiths who enable the making.
Thanks to the people who use Free Software and do amazing things with it.
This talk will use stories and anecdotes to explore the thanks we can have for the Free Software community.

About der.hans:
der.hans works remotely in the US.
In the FLOSS community he is active with conferences and local groups.
He's chairman of PLUG, chair for SeaGL Finance committee, founder of SeaGL Career Expo, RaiseMe career counselor, BoF organizer for the Southern California Linux Expo (SCaLE) and founder of the Free Software Stammtisch.
He speaks regularly at community-led FLOSS conferences such as SeaGL, Penguicon, SCaLE, LCA, FOSSASIA, Tübix, CLT, Kieler Linux Tage, LFNW, OLF, SELF and GeekBeacon Fest.
Hans is available to speak remotely for local groups.
Currently a Customer Data Engineer at Object Rocket. Public statements are not representative of $dayjob.
Fediverse/Mastodon - https://floss.social/@FLOX_advocate

A Way Forward with Communal Computing

O'Reilly Radar - Tue, 2021/08/17 - 05:45

Communal devices in our homes and offices aren’t quite right. In previous articles, we discussed the history of communal computing and the origin of the single user model. Then we reviewed the problems that arise due to identity, privacy, security, experience, and ownership issues. They aren’t solvable by just making a quick fix. They require a huge reorientation in how these devices are framed and designed.

This article focuses on modeling the communal device you want to build and understanding how it fits into the larger context. This includes how it interoperates with services that are connected, and how it communicates across boundaries with other devices in peoples’ homes. Ignore these warnings at your own peril. They can always unplug the device and recycle it.

Let’s first talk about how we gain an understanding of the environment inside homes and offices.

Mapping the communal space

We have seen a long list of problems that keep communal computing from aligning with people’s needs. This misalignment arises from the assumption that there is a single relationship between a person and a device, rather than between all the people involved and their devices.

Dr. S.A. Applin has referred to this assumption as “design individualism”; it is a common misframing used by technology organizations. She uses this term most recently in the paper “Facebook’s Project Aria indicates problems for responsible innovation when broadly deploying AR and other pervasive technology in the Commons:”

“Unfortunately, this is not an uncommon assumption in technology companies, but is a flaw in conceptual modelling that can cause great problems when products based on this ‘design individualism’ are deployed into the Commons (Applin, 2016b). In short, Facebook acknowledges the plural of ‘people’, but sees them as individuals collectively, not as a collective that is enmeshed, intertwined and exists based on multiple, multiplex, social, technological, and socio-technological relationships as described through [PolySocial Reality].”

PolySocial Reality (PoSR) is a theory described in a series of papers by Applin and Fisher (2010-ongoing) on the following:

“[PoSR] models the outcomes when all entities in networks send both synchronous and asynchronous messages to maintain social relationships. These messages can be human-to-human, human-to-machine, and machine-to-machine. PoSR contains the entirety of all messages at all times between all entities, and we can use this idea to understand how various factors in the outcomes from the way that messages are sent and received, can impact our ability to communicate, collaborate, and most importantly, cooperate with each other.”

In the case of PoSR, we need to consider how agents make decisions about the messages between entities. The designers of these non-human entities will make decisions that impact all entities in a system.

The reality is that the “self” only exists as part of a larger network. It is the connections between us and the rest of the network that is meaningful. We pull all of the pseudo identities for those various connections together to create our “one” self.

The model that I’ve found most helpful to address this problem attempts to describe the complete environment of the communal space. It culminates in a map of the connections between nodes, or relationships between entities. This web of interactions includes all the individuals, the devices they use, and the services that intermediate them. The key is to understand how non-human entities intermediate the humans, and how those messages eventually make it to human actors.



The home is a network, like an ecosystem, of people, devices, and services all interacting to create an experience. It is connected with services, people, and devices outside the home as well as my mom, my mom’s picture frame, and Google’s services that enable it.

To see why this map is helpful, consider an ecosystem (or food web). When we only consider interactions between individual animals, like a wolf eating a sheep, we ignore how the changes in population of each animal impacts other actors in the web: too many wolves mean the sheep population dies off. In turn, this change has an impact on other elements of the ecosystem like how much the grass grows. Likewise, when we only consider a single person interacting with one device, we find that most interactions are simple: some input from the user is followed by a response from the device. We often don’t consider other people interacting with the device, nor do we consider how other personal devices exist within that space. We start to see these interactions when we consider other people in the communal space, the new communal device, and all other personal devices. In a communal map, these all interact.

These ecosystems already exist within a home or office. They are made up of items ranging from refrigerator magnets for displaying physical pictures to a connected TV, and they include personal smartphones. The ecosystem extends to the services that the devices connect to outside the home, and to the other people whom they intermediate. We get an incomplete picture if we don’t consider the entire graph. Adding a new device isn’t about filling a specific gap in the ecosystem. The ecosystem may have many problems or challenges, but the ecosystem isn’t actively seeking to solve them. The new device needs to adapt and find its own niche. This includes making the ecosystem more beneficial to the device, something that evolutionary biologists call ‘niche expansion’. Technologists would think about this as building a need for their services.

Thinking about how a device creates a space within an already complex ecosystem is key to understanding what kinds of experiences the team building the device should create. It will help us do things like building for everyone and evolving with the space. It will also help us to avoid the things we should not do, like assuming that every device has to do everything.

Do’s and don’ts of building communal devices

With so much to consider when building communal devices, where do you start? Here are a few do’s and don’ts:

Do user research in the users’ own environment

Studying and understanding expectations and social norms is the key discovery task for building communal devices. Expectations and norms dictate the rules of the environment into which your device needs to fit, including people’s pseudo-identities, their expectations around privacy, and how willing they are to deal with the friction of added security. Just doing a survey isn’t enough.  Find people who are willing to let you see how they use these devices in their homes, and ask lots of questions about how they feel about the devices.

“If you are going to deal with social, people, communal, community, and general sociability, I would suggest hiring applied anthropologists and/or other social scientists on product teams. These experts will save you time and money, by providing you with more context and understanding of what you are making and its impact on others. This translates into more accurate and useful results.”

– Dr. S.A. Applin

Observing where the devices are placed and how the location’s use changes over time will give you fascinating insights about the context in which the device is used. A living room may be a children’s play area in the morning, a home office in the middle of the day, and a guest bedroom at night. People in these contexts have different sets of norms and privacy expectations.

As part of the user research, you should be building an ecosystem graph of all people present and the devices that they use. What people not present are intermediated by technology? Are there stories where this intermediation went wrong? Are there frictions that are created between people that your device should address? Are there frictions that the device should get out of the way of?

Do build for everyone who might have access

Don’t focus on the identity of the person who buys and sets up the device. You need to consider the identity (or lack) of everyone who could have access. Consider whether they feel that information collected about them violates their desire to control the information (as in Contextual Integrity). This could mean you need to put up walls to prevent users from doing something sensitive without authorization. Using the Zero Trust framework’s “trust engine” concept, you should ask for the appropriate level of authentication before proceeding.

Most of today’s user experience design is focused on making frictionless or seamless experiences. This goal doesn’t make sense when considering a risk tradeoff. In some cases, adding friction increases the chance that a user won’t move forward with a risky action, which could be a good thing. If the potential risk of showing a private picture is high, you should make it harder to show that picture.

Realize you may not always understand the right context. Having good and safe default states for those cases is important. It is your job to adjust or simplify the model so that people can understand and interpret why the device does something.

Do consider pseudo-identities for individuals and groups

Avoid singular identities and focus on group pseudo-identities. If users don’t consider these devices their own, why not have the setup experience mirror those expectations? Build device setup, usage, and management around everyone who should have a say in the device’s operation.

Pseudo-identities become very interesting when you start to learn what certain behaviors mean for subgroups. Is this music being played for an individual with particular tastes? Or does the choice reflect a compromise between multiple people in the room? Should it avoid explicit language since there are children present?

Group norms and relationships need to be made more understandable. It will take technology advances to make these norms more visible. These advances include using machine learning to help the device understand what kind of content it is showing, and who (or what) is depicted in that content. Text, image, and video analysis needs to take place to answer the question: what type of content is this and who is currently in that context? It also means using contextual prediction to consider who may be in the room, their relationship to the people in the content, and how they may feel about the content. When in doubt, restrict what you do.

Do evolve with the space

As time goes on, life events will change the environment in which the device operates. Try to detect those changes and adapt accordingly. New pseudo-identities could be present, or the identity representing the group may shift. It is like moving into a new home. You may set things up in one way only to find months later there is a better configuration. Be aware of these changes and adapt.

If behavior that would be considered anomalous becomes the norm, something may have changed about the use of that space. Changes in use are usually led by a change in life–for example, someone moving in or out could trigger a change in how a device is used. Unplugging the device and moving it to a different part of the room or to a different shelf symbolizes a new need for contextual understanding. If you detect a change in the environment but don’t know why the change was made, ask.

Do use behavioral data carefully, or don’t use it at all

All communal devices end up collecting data. For example, Spotify uses what you are listening to when building recommendation systems. When dealing with behavioral information, the group’s identity is important, not the individual’s. If you don’t know who is in front of the device, you should consider whether you can use that behavioral data at all. Rather than using an individual identity, you may want to default to the group pseudo-identity’s recommendations. What does the whole house usually like to listen to?



When the whole family is watching, how do we find a common ground based on all of our preferences, rather than the owner’s? Spotify has a Premium Family package where each person gets a recommended playlist based on everyone’s listening behavior called a Family Mix, whereas Netflix requires users to choose between individual profiles.

Spotify has family and couple accounts that allow multiple people to have an account under one bill. Each person gets their own login and recommendations. Spotify gives all sub-accounts on the subscription access to a shared playlist (like a Family Mix) that makes recommendations based on the group’s preferences.

Spotify, and services like it, should go a step further to reduce the weight of a song in their recommendations algorithm when it is being played on a shared device in a communal place–a kitchen, for example. It’s impossible to know everyone who is in a communal space. There’s a strong chance that a song played in a kitchen may not be preferred by anyone that lives there. To give that particular song a lot of weight will start to change recommendations on the group members’ personal devices.

If you can’t use behavioral data appropriately, don’t bring it into a user’s profile on your services. You should probably not collect it at all until you can handle the many people who could be using the device. Edge processing can allow a device to build context that respects the many people and their pseudo-identities that are at play in a communal environment. Sometimes it is just safer to not track.

Don’t assume that automation will work in all contexts

Prediction technology helps communal devices by finding behavior patterns. These patterns allow the device to calculate what content should be displayed and the potential trust. If a student always listens to music after school while doing homework, the device can assume that contextual integrity holds if the student is the only person there. These assumptions get problematic when part of the context is no longer understood, like when the student has other classmates over. That’s when violations of norms or of privacy expectations are likely to occur. If other people are around, different content is being requested, or if it is a different time of day, the device may not know enough to predict the correct information to display.

Amazon’s Alexa has started wading into these waters with their Hunches feature. If you say “good night” to Alexa, it can decide to turn off the lights. What happens if someone is quietly reading in the living room when the lights go out?  We’ve all accidentally turned the lights out on a friend or partner, but such mistakes quickly become more serious when they’re made by algorithm.

When the prediction algorithm’s confidence is low, it should disengage and try to learn the new behavior. Worst case, just ask the user what is appropriate and gauge the trust vs risk tradeoff accordingly. The more unexpected the context, the less likely it is that the system should presume anything. It should progressively restrict features until it is at its core: for home assistants, that may just mean displaying the current time.

Don’t include all service functionality on the device

All product teams consider what they should add next to make a device “fully functional” and reflect all of the service possibilities. For a communal device, you can’t just think about what you could put there; you also have to consider what you will never put there. An example could be allowing access to Gmail messages from a Google Home Hub. If it doesn’t make sense for most people to have access to some feature, it shouldn’t be there in the first place. It just creates clutter and makes the device harder to use. It is entirely appropriate to allow people to change personal preferences and deal with highly personal information on their own, private devices. There is a time and place for the appropriate content.

Amazon has considered whether Echo users should be allowed to complete a purchase, or limit them to just adding items to a shopping list. They have had to add four digit codes and voice profiles. The resulting interface is complex enough to warrant a top level help article on why people can’t make the purchases.

If you have already built too much, think about how to sunset certain features so that the value and differentiator of your device is clearer. Full access to personal data doesn’t work in the communal experience. It is a chance for some unknown privacy violation to occur.

Don’t assume your devices will be the only ones

Never assume that your company’s devices will be the only ones in the space. Even for large companies like Amazon, there is no future in which the refrigerator, oven, and TV will all be Amazon devices (even if they are trying really hard). The communal space is built up over a long time, and devices like refrigerators have lifetimes that can span decades.

Think about how your device might work alongside other devices, including personal devices. To do this, you need to integrate with network services (e.g. Google Calendar) or local device services (e.g. Amazon Ring video feed). This is the case for services within a communal space as well. People have different preferences for the services they use to communicate and entertain themselves. For example, Snapchat’s adoption by 13-24 year olds (~90% in the US market) accounts for 70% of its usage. This means that people over 24 years old are using very different services to interact with their family and peers.

Apple’s iOS has started to realize that apps need to ask for permission before collecting information from other devices on a local network. It verifies that an app is allowed to access other devices on the network. Local network access is not a foregone conclusion either: different routers and wifi access points are increasingly managed by network providers.

Communal device manufacturers must build for interoperability between devices whether they like it or not, taking into account industry standards for communicating state, messaging, and more. A device that isn’t networked with the other devices in the home is much more likely to be replaced when the single, non-networked use is no longer valid or current.

Don’t change the terms without an ‘out’ for owners

Bricking a device because someone doesn’t want to pay for a subscription or doesn’t like the new data use policy is bad. Not only will it create distrust in users but it violates the idea that they are purchasing something for their home.

When you need to change terms, allow owners to make a decision about whether they want new functionality or to stop getting updates. Not having an active subscription is no excuse for a device to fail, since devices should be able to work when a home’s WiFi is down or when AWS has a problem that stops a home’s light bulbs from working. Baseline functionality should always be available, even if leading edge features (for example, features using machine learning) require a subscription. “Smart” or not, there should be no such thing as a light bulb that can’t be turned on.

When a company can no longer support a device–either because they’re sunsetting it or, in the worst case, because they are going out of business–they should consider how to allow people to keep using their devices. In some cases, a motivated community can take on the support; this happened with the Jibo community when the device creator shut down.

Don’t require personal mobile apps to use the device

One bad limitation that I’ve seen is requiring an app to be installed on the purchaser’s phone, and requiring the purchaser to be logged in to use the device. Identity and security aren’t always necessary, and being too strict about identity tethers the device to a particular person’s phone.

The Philips Hue smart light bulbs are a way to turn any light fixture into a component in a smart lighting system. However, you need one of their branded apps to control the lightbulbs. If you integrate your lighting system with your Amazon or Google accounts, you still need to know what the bulb or “zone” of your house is called. As a host you end up having to take the action for someone else (say by yelling at your Echo for them) or put a piece of paper in the room with all of the instructions. We are back in the age of overly complicated instructions to turn on a TV and AV system.

In addition to making sure you can integrate with other touch and voice interfaces, you need to consider physical ways to allow anyone to interact. IoT power devices like the VeSync Smart Plug by Etekcity (I have a bunch around the house) have a physical button to allow manual switching, in addition to integrating with your smart home or using their branded apps. If you can’t operate the device manually if you are standing in front of it, is it really being built for everyone in the home?

How do you know you got this right?

Once you have implemented all of the recommendations, how do you know you are on the right track?

A simple way to figure out whether you are building a communal-friendly device is to look for people adding their profiles to the device. This means linking their accounts to other services like Spotify (if you allow that kind of linking). However, not everyone will want to or be able to add their accounts, especially people who are passing through (guests) or who cannot legally consent (children).

Using behavior to detect whether someone else is using the device can be difficult. While people don’t change their taste in music or other interests quickly, they slowly drift through the space of possible options. We seek things that are similar to what we like, but just different enough to be novel. In fact, we see that most of our music tastes are set in our teenage years. Therefore, if a communal device is asked to play songs in a different language or genre whereas a personal device does not, it’s more likely that someone new is listening than that the owner has suddenly learned a new language. Compare what users are doing on your device to their behavior on other platforms (for example, compare a Google Home Hub in the kitchen to a personal iPhone) to determine whether new users are accessing the platform.

Behavioral patterns can also be used to predict demographic information. For example, you may be able to predict that someone is a parent based on their usage patterns. If this confidence is high, and you only see their interests showing up in the behavioral data, that means that other people who are around the device are not using it.

Don’t forget that you can ask the users themselves about who is likely to use the device. This is information that you can collect during initial setup. This can help ensure you are not making incorrect assumptions about the placement and use of the device.

Finally, consider talking with customers about how they use the device, the issues that come up, and how it fits into their lives. Qualitative user research doesn’t end after the initial design phase. You need to be aware of how the device has changed the environment it fits into. Without social scientists you can’t know this.

Is everything a communal experience?

Up until this point we have been talking about devices that are part of the infrastructure of a home, like a smart screen or light switch. Once we realize that technology serves as an intermediary between people, everything is communal.

Inside of a home, roommates generally have to share expenses like utilities with each other. Companies like Klarna and Braid make finances communal. How you pay together is an important aspect to harmony within a home.

You are also part of communities in your neighborhoods. Amazon Sidewalk extends your devices into the neighborhood you live in. This mesh technology starts to map and extend further with each communal space. Where does your home’s communal space end? If you misplaced your keys a block away, a Tile could help you find them. It could also identify people in your neighborhood without considering your neighbors’ privacy expectations.

Communities aren’t just based on proximity. We can extend the household to connect with other households far away. Amazon’s Drop In has started their own calling network between households. Loop, a new startup, is focused on building a device for connecting families in their own social network.

Google/Alphabet’s Sidewalk Labs has taken on projects that aim to make the connected world part of the cityscape. An early project called LinkNYC (owned through a shell corporation) was digital signage that included free calling and USB hubs. This changed how homeless people used the built environment. When walking down the street you could see people’s smartphones dangling from a LinkNYC while they were panhandling nearby. Later, a district-wide project called Sidewalk Toronto was rejected by voters. Every object within the urban environment becomes something that not only collects data but that could be interactive.


The town square and public park has been built to be welcoming to people and set expectations of what they do there, unlike online social media. New Public is taking cues from this type of physical shared space for reimagining the online public square.

Taking cues from the real world, groups like New Public are asking what would happen if we built social media the same way we build public spaces. What if social media followed the norms that we have in social spaces like the public parks or squares?

A key aspect to communal computing is the natural limitations of physical and temporal use. Only so many people can fit inside a kitchen or a meeting room. Only so many people can use a device at once, even if it is a subway ticket machine that services millions of people per month. Only so many can fit onto a sidewalk. We need to consider the way that space and time play a part in these experiences.

Adapt or be unplugged

Rethinking how people use devices together inside our homes, offices, and other spaces is key to the future of ubiquitous computing. We have a long way to go in understanding how context changes the expectations and norms of the people in those spaces. Without updating how we design and build these devices, the device you build will just be one more addition to the landfill.

To understand how devices are used in these spaces, we need to expand our thinking beyond the single owner and design for communal use from the start. If we don’t, the devices will never fit properly into our shared and intimate spaces. The mismatch between expectations and what is delivered will grow greater and lead to more dire problems.

This is a call for change in how we consider devices integrated into our lives. We shouldn’t assume that because humans are adaptive, we can adapt to the technologies built. We should design the technologies to fit into our lives, making sure the devices understand the context in which they’re working.

The future of computing that is contextual, is communal.

Thanks

Thanks to Adam Thomas, Mark McCoy, Hugo Bowne-Anderson, and Danny Nou for their thoughts and edits on the early draft of this. Also, Dr. S.A. Applin for all of the great work on PoSR. Finally, from O’Reilly, Mike Loukides for being a great editor and Susan Thompson for the art.

Categories: Technology

Defending against ransomware is all about the basics

O'Reilly Radar - Tue, 2021/08/10 - 05:18

The concept behind ransomware is simple. An attacker plants malware on your system that encrypts all the files, making your system useless, then offers to sell you the key you need to decrypt the files. Payment is usually in bitcoin (BTC), and the decryption key is deleted if you don’t pay within a certain period. Payments have typically been relatively small—though that’s obviously no longer true, with Colonial Pipeline’s multimillion-dollar payout.

Recently, ransomware attacks have been coupled with extortion: the malware sends valuable data (for example, a database of credit card numbers) back to the attacker, who then threatens to publish the data online if you don’t comply with the request.  

A survey on O’Reilly’s website1 showed that 6% of the respondents worked for organizations that were victims of ransomware attacks. How do you avoid joining them? We’ll have more to say about that, but the tl;dr is simple: pay attention to security basics. Strong passwords, two-factor authentication, defense in depth, staying on top of software updates, good backups, and the ability to restore from backups go a long way. Not only do they protect you from becoming a ransomware victim, but those basics can also help protect you from data theft, cryptojacking, and most other forms of cybercrime. The sad truth is that few organizations practice good security hygiene—and those that don’t end up paying the price.

But what about ransomware? Why is it such an issue, and how is it evolving? Historically, ransomware has been a relatively easy way to make money: set up operations in a country that’s not likely to investigate cybercrime, attack targets that are more likely to pay a ransom, keep the ransom small so it’s easier to pay than to restore from backup, and accept payment via some medium that’s perceived as anonymous. Like most things on the internet, ransomware’s advantage is scale: The WannaCry attack infected around 230,000 systems. If even a small percentage paid the US$300 ransom, that’s a lot of money.

Early on, attacks focused on small and midsize businesses, which often have limited IT staff and no professional security specialists. But more recently, hospitals, governments, and other organizations with valuable data have been attacked. A modern hospital can’t operate without patient data, so restoring systems is literally a matter of life and death. Most recently, we’ve seen attacks against large enterprises, like Colonial Pipeline. And this move toward bigger targets, with more valuable data, has been accompanied by larger ransoms.

Attackers have also gotten more sophisticated and specialized. They’ve set up help desks and customer service agents (much like any other company) to help customers make their payments and decrypt their data. Some criminal organizations offer “ransomware as a service,” running attacks for customers. Others develop the software or create the attacks that find victims. Initiating an attack doesn’t require any technical knowledge; it can all be contracted out, and the customer gets a nice dashboard to show the attack’s progress.

While it’s easy to believe (and probably correct) that government actors have gotten into the game, it’s important to keep in mind that attribution of an attack is very difficult—not least because of the number of actors involved. An “as a service” operator really doesn’t care who its clients are, and its clients may be (willingly) unaware of exactly what they’re buying. Plausible deniability is also a service.

How an attack begins

Ransomware attacks frequently start with phishing. An email to a victim entices them to open an attachment or to visit a website that installs malware. So the first thing you can do to prevent ransomware attacks is to make sure everyone is aware of phishing, very skeptical of any attachments they receive, and appropriately cautious about the websites they visit. Unfortunately, teaching people how to avoid being victimized by a phish is a battle you’re not likely to win. Phishes are getting increasingly sophisticated and now do a good job of impersonating people the victim knows. Spear phishing requires extensive research, and ransomware criminals have typically tried to compromise systems in bulk. But recently, we’ve been seeing attacks against more valuable victims. Larger, more valuable targets, with correspondingly bigger payouts, will merit the investment in research.

It’s also possible for an attack to start when a victim visits a legitimate but compromised website. In some cases, an attack can start without any action by the victim. Some ransomware (for example, WannaCry) can spread directly from computer to computer. One recent attack started through a supply chain compromise: attackers planted the ransomware in an enterprise security product, which was then distributed unwittingly to the product’s customers. Almost any vulnerability can be exploited to plant a ransomware payload on a victim’s device. Keeping browsers up-to-date helps to defend against compromised websites.

Most ransomware attacks begin on Windows systems or on mobile phones. This isn’t to imply that macOS, Linux, and other operating systems are less vulnerable; it’s just that other attack vectors are more common. We can guess at some reasons for this. Mobile phones move between different domains, as the owner goes from a coffee shop to home to the office, and are exposed to different networks with different risk factors. Although they are often used in risky territory, they’re rarely subject to the same device management that’s applied to “company” systems—but they’re often accorded the same level of trust. Therefore, it’s relatively easy for a phone to be compromised outside the office and then bring the attacker onto the corporate network when its owner returns to work.

It’s possible that Windows systems are common attack vectors just because there are so many of them, particularly in business environments. Many also believe that Windows users install updates less often than macOS and Linux users. Microsoft does a good job of patching vulnerabilities before they can be exploited, but that doesn’t do any good if updates aren’t installed. For example, Microsoft discovered and patched the vulnerability that WannaCry exploited well before the attacks began, but many individuals, and many companies, never installed the updates.

Preparations and precautions

The best defense against ransomware is to be prepared, starting with basic security hygiene. Frankly, this is true of any attack: get the basics right and you’ll have much less to worry about. If you’ve defended yourself against ransomware, you’ve done a lot to defend yourself against data theft, cryptojacking, and many other forms of cybercrime.

Security hygiene is simple in concept but hard in practice. It starts with passwords: Users must have nontrivial passwords. And they should never give their password to someone else, whether or not “someone else” is on staff (or claims to be).

Two-factor authentication (2FA), which requires something in addition to a password (for example, biometric authentication or a text message sent to a cell phone) is a must. Don’t just recommend 2FA; require it. Too many organizations buy and install the software but never require their staff to use it. (76% of the respondents to our survey said that their company used 2FA; 14% said they weren’t sure.)

Users should be aware of phishing and be extremely skeptical of email attachments that they weren’t expecting and websites that they didn’t plan to visit. It’s always a good practice to type URLs in yourself, rather than clicking on links in email—even those in messages that appear to be from friends or associates. Users should be aware of phishing and be extremely skeptical of email attachments that they weren’t expecting and websites that they didn’t plan to visit. It’s always a good practice to type URLs in yourself, rather than clicking on links in email—even those in messages that appear to be from friends or associates.

Backups are absolutely essential. But what’s even more important is the ability to restore from a backup. The easiest solution to ransomware is to reformat the disks and restore from backup. Unfortunately, few companies have good backups or the ability to restore from a backup—one security expert guesses that it’s as low as 10%. Here are a few key points:

  • You actually have to do the backups. (Many companies don’t.) Don’t rely solely on cloud storage; backup on physical drives that are disconnected when a backup isn’t in progress. (70% of our survey respondents said that their company performed backups regularly.)
  • You have to test the backups to ensure that you can restore the system. If you have a backup but can’t restore, you’re only pretending that you have a backup. (Only 48% of the respondents said that their company regularly practiced restoring from backups; 36% said they didn’t know.)
  • The backup device needs to be offline, connected only when a backup is in progress. Otherwise, it’s possible for the ransomware attack to encrypt your backup.

Don’t overlook testing your backups. Your business continuity planning should include ransomware scenarios: how do you continue doing business while systems are being restored? Chaos engineering, an approach developed at Netflix, is a good idea. Make a practice of breaking your storage capability, then restoring it from backup. Do this monthly—if possible, schedule it with the product and project management teams. Testing the ability to restore your production systems isn’t just about proving that everything works; it’s about training staff to react calmly in a crisis and resolve the outage efficiently. When something goes bad, you don’t want to be on Stack Overflow asking how to do a restore. You want that knowledge imprinted in everyone’s brains.

Keep operating systems and browsers up-to-date. Too many have become victims because of a vulnerability that was patched in a software update that they didn’t install. (79% of our survey respondents said that their company had processes for updating critical software, including browsers.)

An important principle in any kind of security is “least privilege.” No person or system should be authorized to do anything it doesn’t need to do. For example, no one outside of HR should have access to the employee database. “Of course,” you say—but that includes the CEO. No one outside of sales should have access to the customer database. And so on. Least privilege works for software too. Services need access to other services—but services must authenticate to each other and should only be able to make requests appropriate to their role. Any unexpected request should be rejected and treated as a signal that the software has been compromised. And least privilege works for hardware, whether virtual or physical: finance systems and servers shouldn’t be able to access HR systems, for example. Ideally, they should be on separate networks. You should have a “defense in depth” security strategy that focuses not only on keeping “bad guys” out of your network but also on limiting where they can go once they’re inside. You want to stop an attack that originates on HR systems from finding its way to the finance systems or some other part of the company. Particularly when you’re dealing with ransomware, making it difficult for an attack to propagate from one system to another is all-important.

Attribute-based access control (ABAC) can be seen as an extension of least privilege. ABAC is based on defining policies about exactly who and what should be allowed to access every service: What are the criteria on which trust should be based? And how do these criteria change over time? If a device suddenly moves between networks, does that represent a risk? If a system suddenly makes a request that it has never made before, has it been compromised? At what point should access to services be denied? ABAC, done right, is difficult and requires a lot of human involvement: looking at logs, deciding what kinds of access are appropriate, and keeping policies up-to-date as the situation changes. Working from home is an example of a major change that security people will need to take into account. You might have “trusted” an employee’s laptop, but should you trust it when it’s on the same network as their children? Some of this can be automated, but the bottom line is that you can’t automate security.

Finally: detecting a ransomware attack isn’t difficult. If you think about it, this makes a lot of sense: encrypting all your files requires a lot of CPU and filesystem activity, and that’s a red flag. The way files change is also a giveaway. Most unencrypted files have low entropy: they have a high degree of order. (On the simplest level, you can glance at a text file and tell that it’s text. That’s because it has a certain kind of order. Other kinds of files are also ordered, though the order isn’t as apparent to a human.) Encrypted files have high entropy (i.e., they’re very disordered)—they have to be; otherwise, they’d be easy to decrypt. Computing a file’s entropy is simple and for these purposes doesn’t require looking at the entire file. Many security products for desktop and laptop systems are capable of detecting and stopping a ransomware attack. We don’t do product recommendations, but we do recommend that you research the products that are available. (PC Magazine’s 2021 review of ransomware detection products is a good place to start.)

In the data center or the cloud

Detecting ransomware once it has escaped into a data center, whether in the cloud or on-premises, isn’t a fundamentally different task, but commercial products aren’t there yet. Again, prevention is the best defense, and the best defense is strong on the fundamentals. Ransomware makes its way from a desktop to a data center via compromised credentials and operating systems that are unpatched and unprotected. We can’t say this too often: make sure secrets are protected, make sure identity and access management are configured correctly, make sure you have a backup strategy (and that the backups work), and make sure operating systems are patched—zero-trust is your friend.

Amazon Web Services, Microsoft Azure, and Google Cloud all have services named “Identity and Access Management” (IAM); the fact that they all converged on the same name tells you something about how important it is. These are the services that configure users, roles, and privileges, and they’re the key to protecting your cloud assets. IAM doesn’t have a reputation for being easy. Nevertheless, it’s something you have to get right; misconfigured IAM is at the root of many cloud vulnerabilities. One report claims that well over 50% of the organizations using Google Cloud were running workloads with administrator privileges. While that report singles out Google, we believe that the same is true at other cloud providers. All of these workloads are at risk; administrator privileges should only be used for essential management tasks. Google Cloud, AWS, Azure, and the other providers give you the tools you need to secure your workloads, but they can’t force you to use them correctly.

It’s worth asking your cloud vendor some hard questions. Specifically, what kind of support can your vendor give you if you are a victim of a security breach? What can your vendor do if you lose control of your applications because IAM has been misconfigured? What can your vendor do to restore your data if you succumb to ransomware? Don’t assume that everything in the cloud is “backed up” just because it’s in the cloud. AWS and Azure offer backup services; Google Cloud offers backup services for SQL databases but doesn’t appear to offer anything comprehensive. Whatever your solution, don’t just assume it works. Make sure that your backups can’t be accessed via the normal paths for accessing your services—that’s the cloud version of “leave your physical backup drives disconnected when not in use.” You don’t want an attacker to find your cloud backups and encrypt them too. And finally, test your backups and practice restoring your data.

Any frameworks your IT group has in place for observability will be a big help: Abnormal file activity is always suspicious. Databases that suddenly change in unexpected ways are suspicious. So are services (whether “micro” or “macroscopic”) that suddenly start to fail. If you have built observability into your systems, you’re at least partway there.

How confident are you that you can defend against a ransomware attack? In our survey, 60% of the respondents said that they were confident; another 28% said “maybe,” and 12% said “no.” We’d give our respondents good, but not great, marks on readiness (2FA, software updates, and backups). And we’d caution that confidence is good but overconfidence can be fatal. Make sure that your defenses are in place and that those defenses work.

If you become a victim

What do you do? Many organizations just pay. (Ransomwhe.re tracks total payments to ransomware sites, currently estimated at $92,120,383.83.) The FBI says that you shouldn’t pay, but if you don’t have the ability to restore your systems from backups, you might not have an alternative. Although the FBI was able to recover the ransom paid by Colonial Pipeline, I don’t think there’s any case in which they’ve been able to recover decryption keys.

Whether paying the ransom is a good option depends on how much you trust the cybercriminals responsible for the attack. The common wisdom is that ransomware attackers are trustworthy, that they’ll give you the key you need to decrypt your data and even help you use it correctly. If the word gets out that they can’t be trusted to restore your systems, they’ll find fewer victims willing to pay up. However, at least one security vendor says that 40% of ransomware victims who pay never get their files restored. That’s a very big “however,” and a very big risk—especially as ransomware demands skyrocket. Criminals are, after all, criminals. It’s all the more reason to have good backups.

There’s another reason not to pay that may be more important. Ransomware is a big business, and like any business, it will continue to exist as long as it’s profitable. Paying your attackers might be an easy solution short-term, but you’re just setting up the next victim. We need to protect each other, and the best way to do that is to make ransomware less profitable.

Another problem that victims face is extortion. If the attackers steal your data in addition to encrypting it, they can demand money not to publish your confidential data online—which may leave you with substantial penalties for exposing private data under laws such as GDPR and CCPA. This secondary attack is becoming increasingly common.

Whether or not they pay, ransomware victims frequently face revictimization because they never fix the vulnerability that allowed the ransomware in the first place. So they pay the ransom, and a few months later, they’re attacked again, using the same vulnerability. The attack may come from the same people or it may come from someone else. Like any other business, an attacker wants to maximize its profits, and that might mean selling the information they used to compromise your systems to other ransomware outfits. If you become a victim, take that as a very serious warning. Don’t think that the story is over when you’ve restored your systems.

Here’s the bottom line, whether or not you pay. If you become a victim of ransomware, figure out how the ransomware got in and plug those holes. We began this article by talking about basic security practices. Keep your software up-to-date. Use two-factor authentication. Implement defense in depth wherever possible. Design zero-trust into your applications. And above all, get serious about backups and practice restoring from backup regularly. You don’t want to become a victim again.

Thanks to John Viega, Dean Bushmiller, Ronald Eddings, and Matthew Kirk for their help. Any errors or misunderstandings are, of course, mine.

Footnote
  1. The survey ran July 21, 2021, through July 23, 2021, and received more than 700 responses.
Categories: Technology

Radar trends to watch: August 2021

O'Reilly Radar - Mon, 2021/08/02 - 07:27

Security continues to be in the news: most notably the Kaseya ransomware attack, which was the first case of a supply chain ransomware attack that we’re aware of. That’s new and very dangerous territory. However, the biggest problem in security remains simple: take care of the basics. Good practices for authentication, backups, and software updates are the best defense against ransomware and many other attacks.

Facebook has said that it is now focusing on building the virtual reality Metaverse, which will be the successor to the web. To succeed, VR will have to get beyond ultra geeky goggles. But Google Glass showed the way, and that path is being followed by Apple and Facebook in their product development.

AI and Data
  • There’s a new technique for protecting natural language systems from attack by misinformation and malware bots: using honeypots to capture attackers’ key phrases proactively, and incorporate defenses into the training process.
  • DeepMind’s AlphaFold has made major breakthroughs in protein folding. DeepMind has released the source code for AlphaCode 2.0 on GitHub. DeepMind will also release the structure of every known protein. The database currently includes over 350,000 protein structures, but is expected to grow to over 100,000,000. This is of immense importance to research in biology and medicine.
  • Google searches can now tell you why a given result was included. It’s a minor change, but we’ve long argued that in AI, “why” may give you more information than “what.”
  • Researchers have been able to synthesize speech using the brainwaves of a patient who has been paralyzed and unable to talk. The process combines brain wave detection with models that predict the next word.
  • The National Institute of Standards (NIST) tests systems for identifying airline passengers for flight boarding.  They claim that they have achieved 99.87% accuracy, without significant differences in performance between different demographic groups.
  • An attempt at adding imagination to AI works has been made by combining different attributes of known objects. Humans are good at this: we can imagine a green dog, for example.
  • Phase precession is a recently discovered phenomenon by which neurons encode information in the timing of their firing.  It may relate to humans’ ability to learn on the basis of a small number of examples.
  • Yoshua Bengio, Geoff Hinton, and Yann LeCun give an assessment of the state of Deep Learning, its future, and its ability to solve problems.
  • AI is learning to predict human behavior from videos (e.g., movies). This research attempts to answer the question “What will someone do next?” in situations where there are large uncertainties. One trick is reverting to high-level concepts (e.g., “greet”) when the system can’t predict more specific behaviors (e.g., “shake hands”).
Programming
  • JAX is a new Python library for high-performance mathematics. It includes a just-in-time compiler, support for GPUs and TPUs, automatic differentiation, and automatic vectorization and parallelization.
  • Matrix is an open standard for a decentralized “conversation store” that is used as the background for many other kinds of applications. Germany has announced that it will use Matrix as the standard for digital messaging in its national electronic health records system.
  • Brython is Python 3.9.5 running in the browser, with access to the DOM.  It’s not a replacement for JavaScript, but there are a lot of clever things you can do with it.
  • Using a terminal well has always been a superpower. Warp is a new terminal emulator built in Rust with features that you’d never expect: command sharing, long-term cloud-based history, a true text editor, and a lot more.
  • Is it WebAssembly’s time? Probably not yet, but it’s coming. Krustlets allow you to run WebAssembly workloads under Kubernetes. There is also an alternative to a filesystem written in wasm; JupyterLite is an attempt to build a complete distribution of Jupyter, including JupyterLab, that runs entirely in the browser.
Robotics
  • Google launches Intrinsic, a moonshot project to develop industrial robots.
  • 21st Century Problems: should autonomous delivery robots be allowed in bike lanes? The Austin (Texas) City Council already has to consider this issue.
Materials
  • Veins in materials? Researchers have greatly reduced the time it takes to build vascular systems into materials, which could have an important impact on our ability to build self-healing structures.
  • Researchers have designed fabrics that can cool the body by up to 5 degrees Celsius by absorbing heat and re-emitting it in the near-infrared range.
Hardware
  • A bendable processor from ARM could be the future of wearable computing. It’s far from a state-of-the-art CPU, and probably will never be one, but with further development could be useful in edge applications that require flexibility.
  • Google experiments with error correction for quantum computers.  Developing error correction is a necessary step towards making quantum computers “real.”
Security
  • Attackers have learned to scan repos like GitHub to find private keys and other credentials that have inadvertently been left in code that has been checked in. Checkov, a code analysis tool for detecting vulnerabilities in cloud infrastructure, can now can find these credentials in code.
  • Amnesty International has released an open source tool for checking whether a phone has been compromised by Pegasus, the spyware sold by the NSO group to many governments, and used (among other things) to track journalists. Matthew Green’s perspective on “security nihilism” discusses the NSO’s activity; it is a must-read.
  • The REvil ransomware gang (among other things, responsible for the Kaseya attack, which infected over 1,000 businesses) has disappeared; all of its web sites went down at the same time. Nobody knows why; possibilities include pressure from law enforcement, reorganization, and even retirement.
  • DID is a new proposed form of decentralized digital identity that is currently being tested in the travel passports with COVID data being developed by the International Air Transport Association.
  • A massive ransomware attack by the REvil cybercrime group exploited supply chain vulnerabilities. The payload was implanted in a security product by Kaseya that is used to automate software installation and updates. The attack apparently only affects on-premises infrastructure. Victims are worldwide; the number of victims is in the “low thousands.”
  • Kubernetes is being used by the FancyBear cybercrime group, and other groups associated with the Russian government, to orchestrate a worldwide wave of brute-force attacks aimed at data theft and credential stealing.
Operations
  • Observability is the next step beyond monitoring.  That applies to data and machine learning, too, and is part of incorporating ML into production processes.
  • A new load balancing algorithm does a much better job of managing load at datacenters, and reduces power consumption by allowing servers to be shut down when not in use.
  • MicroK8S is a version of Kubernetes designed for small clusters that claims to be fault tolerant and self-healing, requiring little administration.
  • Calico is a Kubernetes plugin that simplifies network configuration. 
Web and Mobile
  • Scuttlebutt is a protocol for the decentralized web that’s “a way out of the social media rat race.”  It’s (by definition) “sometimes on,” not a constant presence.
  • Storywrangler is a tool for analyzing Twitter at scale.  It picks out the most popular word combinations in a large number of languages.
  • Google is adding support for “COVID vaccination passports” to Android devices.
  • Tim Berners-Lee’s Solid protocol appears to be getting real, with a small ecosystem of pod providers (online data stores) and apps.
  • Why are Apple and Google interested in autonomous vehicles? What’s the business model? They are after the last few minutes of attention. If you aren’t driving, you’ll be in an app.
Virtual Reality
  • Mark Zuckerberg has been talking up the Metaverse as the next stage in the Internet’s evolution: a replacement for the Web as an AR/VR world. But who will want to live in Facebook’s world?
  • Facebook is committing to the OpenXR standard for its Virtual Reality products. In August 2022, all new applications will be required to use OpenXR; its proprietary APIs will be deprecated.
Miscellaneous
  • The Open Voice Network is an industry association organized by the Linux Foundation that is dedicated to ethics in voice-driven applications. Their goal is to close the “trust gap” in voice applications.
Categories: Technology

Communal Computing’s Many Problems

O'Reilly Radar - Tue, 2021/07/20 - 04:37

In the first article of this series, we discussed communal computing devices and the problems they create–or, more precisely, the problems that arise because we don’t really understand what “communal” means. Communal devices are intended to be used by groups of people in homes and offices. Examples include popular home assistants and smart displays like the Amazon Echo, Google Home, Apple HomePod, and many others.  If we don’t create these devices with communities of people in mind, we will continue to build the wrong ones.

Ever since the concept of a “user” was invented (which was probably later than you think), we’ve assumed that devices are “owned” by a single user. Someone buys the device and sets up the account; it’s their device, their account.  When we’re building shared devices with a user model, that model quickly runs into limitations. What happens when you want your home assistant to play music for a dinner party, but your preferences have been skewed by your children’s listening habits? We, as users, have certain expectations for what a device should do. But we, as technologists, have typically ignored our own expectations when designing and building those devices.

This expectation isn’t a new one either. The telephone in the kitchen was for everyone’s use. After the release of the iPad in 2010 Craig Hockenberry discussed the great value of communal computing but also the concerns:

“When you pass it around, you’re giving everyone who touches it the opportunity to mess with your private life, whether intentionally or not. That makes me uneasy.”

Communal computing requires a new mindset that takes into account users’ expectations. If the devices aren’t designed with those expectations in mind, they’re destined for the landfill. Users will eventually experience “weirdness” and “annoyance” that grows to distrust of the device itself. As technologists, we often call these weirdnesses “edge cases.” That’s precisely where we’re wrong: they’re not edge cases, but they’re at the core of how people want to use these devices.

In the first article, we listed five core questions we should ask about communal devices:

  1. Identity: Do we know all of the people who are using the device?
  2. Privacy: Are we exposing (or hiding) the right content for all of the people with access?
  3. Security: Are we allowing all of the people using the device to do or see what they should and are we protecting the content from people that shouldn’t?
  4. Experience: What is the contextually appropriate display or next action?
  5. Ownership: Who owns all of the data and services attached to the device that multiple people are using?

In this article, we’ll take a deeper look at these questions, to see how the problems manifest and how to understand them.

Identity

All of the problems we’ve listed start with the idea that there is one registered and known person who should use the device. That model doesn’t fit reality: the identity of a communal device isn’t a single person, but everyone who can interact with it. This could be anyone able to tap the screen, make a voice command, use a remote, or simply be sensed by it. To understand this communal model and the problems it poses, start with the person who buys and sets up the device. It is associated with that individual’s account, like a personal Amazon account with its order history and shopping list. Then it gets difficult. Who doesn’t, can’t, or shouldn’t have full access to an Amazon account? Do you want everyone who comes into your house to be able to add something to your shopping list?

If you think about the spectrum of people who could be in your house, they range from people whom you trust, to people who you don’t really trust but who should be there, to those who you  shouldn’t trust at all.


There is a spectrum of trust for people who have access to communal devices

In addition to individuals, we need to consider the groups that each person could be part of. These group memberships are called “pseudo-identities”; they are facets of a person’s full identity. They are usually defined by how the person associated themself with a group of other people. My life at work, home, a high school friends group, and as a sports fan show different parts of my identity. When I’m with other people who share the same pseudo-identity, we can share information. When there are people from one group in front of a device I may avoid showing content that is associated with another group (or another personal pseudo-identity). This can sound abstract, but it isn’t; if you’re with friends in a sports bar, you probably want notifications about the teams you follow. You probably don’t want news about work, unless it’s an emergency.

There are important reasons why we show a particular facet of our identity in a particular context. When designing an experience, you need to consider the identity context and where the experience will take place. Most recently this has come up with work from home. Many people talk about ‘bringing your whole self to work,’ but don’t realize that “your whole self” isn’t always appropriate. Remote work changes when and where I should interact with work. For a smart screen in my kitchen, it is appropriate to have content that is related to my home and family. Is it appropriate to have all of my work notifications and meetings there? Could it be a problem for children to have the ability to join my work calls? What does my IT group require as far as security of work devices versus personal home devices?

With these devices we may need to switch to a different pseudo-identity to get something done. I may need to be reminded of a work meeting. When I get a notification from a close friend, I need to decide whether it is appropriate to respond based on the other people around me.

The pandemic has broken down the barriers between home and work. The natural context switch from being at work and worrying about work things and then going home to worry about home things is no longer the case. People need to make a conscious effort to “turn off work” and to change the context. Just because it is the middle of the workday doesn’t always mean I want to be bothered by work. I may want to change contexts to take a break. Such context shifts add nuance to the way the current pseudo-identity should be considered, and to the overarching context you need to detect.

Next, we need to consider identities as groups that I belong to. I’m part of my family, and my family would potentially want to talk with other families. I live in a house that is on my street alongside other neighbors. I’m part of an organization that I identify as my work. These are all pseudo-identities we should consider, based on where the device is placed and in relation to other equally important identities.

The crux of the problem with communal devices is the multiple identities that are or may be using the device. This requires greater understanding of who, where, and why people are using the device. We need to consider the types of groups that are part of the home and office.

Privacy

As we consider the identities of all people with access to the device, and the identity of the place the device is to be part of, we start to consider what privacy expectations people may have given the context in which the device is used.

Privacy is hard to understand. The framework I’ve found most helpful is Contextual Integrity which was introduced by Helen Nissenbaum in the book Privacy in Context. Contextual Integrity describes four key aspects of privacy:

  1. Privacy is provided by appropriate flows of information.
  2. Appropriate information flows are those that conform to contextual information norms.
  3. Contextual informational norms refer to five independent parameters: data subject, sender, recipient, information type, and transmission principle.
  4. Conceptions of privacy are based on ethical concerns that evolve over time.

What is most important about Contextual Integrity is that privacy is not about hiding information away from the public but giving people a way to control the flow of their own information. The context in which information is shared determines what is appropriate.

This flow either feels appropriate, or not, based on key characteristics of the information (from Wikipedia):

  1. The data subject: Who or what is this about?
  2. The sender of the data: Who is sending it?
  3. The recipient of the data: Who will eventually see or get the data?
  4. The information type: What type of information is this (e.g. a photo, text)?
  5. The transmission principle: In what set of norms is this being shared (e.g. school, medical, personal communication)?

We rarely acknowledge how a subtle change in one of these parameters could be a violation of privacy. It may be completely acceptable for my friend to have a weird photo of me, but once it gets posted on a company intranet site it violates how I want information (a photo) to flow. The recipient of the data has changed to something I no longer find acceptable. But I might not care whether a complete stranger (like a burglar) sees the photo, as long as it never gets back to someone I know.

For communal use cases, the sender or receiver of information is often a group. There may be  multiple people in the room during a video call, not just the person you are calling. People can walk in and out. I might be happy with some people in my home seeing a particular photo, but find it embarrassing if it is shown to guests at a dinner party.

We must also consider what happens when other people’s content is shown to those who shouldn’t see it. This content could be photos or notifications from people outside the communal space that could be seen by anyone in front of the device. Smartphones can hide message contents when you aren’t near your phone for this exact reason.

The services themselves can expand the ‘receivers’ of information in ways that create uncomfortable situations. In Privacy in Context, Nissenbaum talks about the privacy implications of Google Street View when it places photos of people’s houses on Google Maps. When a house was only visible to people who walked down the street that was one thing, but when anyone in the world can access a picture of a house, that changes the parameters in a way that causes concern. Most recently, IBM used Flickr photos that were shared under a Creative Commons license to train facial recognition algorithms. While this didn’t require any change to terms of the service it was a surprise to people and may be in violation of the Creative Commons license. In the end, IBM took the dataset down.

Privacy considerations for communal devices should focus on who is gaining access to information and whether it is appropriate based on people’s expectations. Without using a framework like contextual inquiry we will be stuck talking about generalized rules for data sharing, and there will always be edge cases that violate someone’s privacy.

A note about children

Children make identity and privacy especially tricky. About 40% of all households have a child. Children shouldn’t be an afterthought. If you aren’t compliant with local laws you can get in a lot of trouble. In 2019, YouTube had to settle with the FTC for a $170 million fine for selling ads targeting children. It gets complicated because the ‘age of consent’ depends on the region as well: COPPA in the US is for people under 13 years old, CCPA in California is for people under 16, and GDPR overall is under 16 years old but each member state can set its own. The moment you acknowledge children are using your platforms, you need to accommodate them.

For communal devices, there are many use cases for children. Once they realize they can play whatever music they want (including tracks of fart sounds) on a shared device they will do it. Children focus on the exploration over the task and will end up discovering way more about the device than parents might. Adjusting your practices after building a device is a recipe for failure. You will find that the paradigms you choose for other parties won’t align with the expectations for children, and modifying your software to accommodate children is difficult or impossible. It’s important to account for children from the beginning.

Security

To get to a home assistant, you usually need to pass through a home’s outer door. There is usually a physical limitation by way of a lock. There may be alarm systems. Finally, there are social norms: you don’t just walk into someone else’s house without knocking or being invited.

Once you are past all of these locks, alarms, and norms, anyone can access the communal device. Few things within a home are restricted–possibly a safe with important documents. When a communal device requires authentication, it is usually subverted in some way for convenience: for example, a password might be taped to it, or a password may never have been set.

The concept of Zero Trust Networks speaks to this problem. It comes down to a key question: is the risk associated with an action greater than the trust we have that the person performing the action is who they say they are?


Source: https://learning.oreilly.com/library/view/zero-trust-networks/9781491962183/

Passwords, passcodes, or mobile device authentication become nuisances; these supposed secrets are frequently shared between everyone who has access to the device. Passwords might be written down for people who can’t remember them, making them visible to less trusted people visiting your household. Have we not learned anything since the movie War Games?

When we consider the risk associated with an action, we need to understand its privacy implications. Would the action expose someone’s information without their knowledge? Would it allow a person to pretend to be someone else? Could another party tell easily the device was being used by an imposter?

There is a tradeoff between the trust and risk. The device needs to calculate whether we know who the person is and whether the person wants the information to be shown. That needs to be weighed against the potential risk or harm if an inappropriate person is in front of the device.


Having someone in your home accidentally share embarrassing photos could have social implications.

A few examples of this tradeoff:

FeatureRisk and trust calculationPossible issuesShowing a photo when the device detects someone in the roomPhoto content sensitivity, who is in the room Showing an inappropriate photo to a complete strangerStarting a video callPerson’s account being used for the call, the actual person starting the callWhen the other side picks up it may not be who they thought it would bePlaying a personal song playlistPersonal recommendations being impactedIncorrect future recommendationsAutomatically ordering something based on a voice commandConvenience of ordering, approval of the shopping account’s ownerShipping an item that shouldn’t have been ordered

This gets even trickier when people no longer in the home can access the devices remotely. There have been cases of harassment, intimidation, and domestic abuse by people whose access should have been revoked: for example, an ex-partner turning off the heating system. When should someone be able to access communal devices remotely? When should their access be controllable from the devices themselves? How should people be reminded to update their access control lists? How does basic security maintenance happen inside a communal space?

See how much work this takes in a recent account of pro bono security work for a harassed mother and her son. Or how a YouTuber was blackmailed, surveilled, and harassed by her smart home. Apple even has a manual for this type of situation.

At home, where there’s no corporate IT group to create policies and automation to keep things secure, it’s next to impossible to manage all of these security issues. Even some corporations have trouble with it. We need to figure out how users will maintain and configure a communal device over time. Configuration for devices in the home and office can be wrought with lots of different types of needs over time.

For example, what happens when someone leaves the home and is no longer part of it? We will need to remove their access and may even find it necessary to block them from certain services. This is highlighted with the cases of harassment of people through spouses that still control the communal devices. Ongoing maintenance of a particular device could also be triggered by a change in needs by the community. A home device may be used to just play music or check the weather at first. But when a new baby comes home, being able to do video calling with close relatives may become a higher priority.

End users are usually very bad at changing configuration after it is set. They may not even know that they can configure something in the first place. This is why people have made a business out of setting up home stereo and video systems. People just don’t understand the technologies they are putting in their houses. Does that mean we need some type of handy-person that does home device setup and management? When more complicated routines are required to meet the needs, how does someone allow for changes without writing code, if they are allowed to?

Communal devices need new paradigms of security that go beyond the standard login. The world inside a home is protected by a barrier like a locked door; the capabilities of communal devices should respect that. This means both removing friction in some cases and increasing it in others.

A note about biometrics
 “Turn your face” to enroll in Google Face Match and personalize your devices.
(Source: Google Face Match video, https://youtu.be/ODy_xJHW6CI?t=26)

Biometric authentication for voice and face recognition can help us get a better understanding of who is using a device. Examples of biometric authentication include FaceID for the iPhone and voice profiles for Amazon Alexa. There is a push for regulation of facial recognition technologies, but opt-in for authentication purposes tends to be carved out.

However, biometrics aren’t without problems. In addition to issues with skin tone, gender bias, and local accents, biometrics assumes that everyone is willing to have a biometric profile on the device–and that they would be legally allowed to (for example, children may not be allowed to consent to a biometric profile). It also assumes this technology is secure. Google FaceMatch makes it very clear it is only a technology for personalization, rather than authentication. I can only guess they have legalese to avoid liability when an unauthorized person spoofs someone’s face, say by taking a photo off the wall and showing it to the device.

What do we mean by “personalization?” When you walk into a room and FaceMatch identifies your face, the Google Home Hub dings, shows your face icon, then shows your calendar (if it is connected), and a feed of personalized cards. Apple’s FaceID uses many levels of presentation attack detection (also known as “anti-spoofing”): it verifies your eyes are open and you are looking at the screen, and it uses a depth sensor to make sure it isn’t “seeing” a photo. The phone can then show hidden notification content or open the phone to the home screen. This measurement of trust and risk is benefited by understanding who could be in front of the device. We can’t forget that the machine learning that is doing biometrics is not a deterministic calculation; there is always some degree of uncertainty.

Social and information norms define what we consider acceptable, who we trust, and how much. As trust goes up, we can take more risks in the way we handle information. However, it’s difficult to connect trust with risk without understanding people’s expectations. I have access to my partner’s iPhone and know the passcode. It would be a violation of a norm if I walked over and unlocked it without being asked, and doing so will lead to reduced trust between us.

As we can see, biometrics does offer some benefits but won’t be the panacea for the unique uses of communal devices. Biometrics will allow those willing to opt-in to the collection of their biometric profile to gain personalized access with low friction, but it will never be useable for everyone with physical access.

Experiences

People use a communal device for short experiences (checking the weather), ambient experiences (listening to music or glancing at a photo), and joint experiences (multiple people watching a movie). The device needs to be aware of norms within the space and between the multiple people in the space. Social norms are rules by which people decide how to act in a particular context or space. In the home, there are norms about what people should and should not do. If you are a guest, you try to see if people take their shoes off at the door; you don’t rearrange things on a bookshelf; and so on.

Most software is built to work for as many people as possible; this is called generalization. Norms stand in the way of generalization. Today’s technology isn’t good enough to adapt to every possible situation. One strategy is to simplify the software’s functionality and let the humans enforce norms. For example, when multiple people talk to an Echo at the same time, Alexa will either not understand or it will take action on the last command. Multi-turn conversations between multiple people are still in their infancy. This is fine when there are understood norms–for example, between my partner and I. But it doesn’t work so well when you and a child are both trying to shout commands.


Shared experiences can be challenging like a parent and child yelling at an Amazon Echo to play what they want.

Norms are interesting because they tend to be learned and negotiated over time, but are invisible. Experiences that are built for communal use need to be aware of these invisible norms through cues that can be detected from peoples’ actions and words. This gets especially tricky because a conversation between two people could include information subject to different expectations (in a Contextual Integrity sense) about how that information is used. With enough data, models can be created to “read between the lines” in both helpful and dangerous ways.

Video games already cater to multiple people’s experiences. With the Nintendo Switch or any other gaming system, several people can play together in a joint experience. However, the rules governing these experiences are never applied to, say, Netflix. The assumption is always that one person holds the remote. How might these experiences be improved if software could accept input from multiple sources (remote controls, voice, etc.) to build a selection of movies that is appropriate for everyone watching?

Communal experience problems highlight inequalities in households. With women doing more household coordination than ever, there is a need to rebalance the tasks for households. Most of the time these coordination tasks are relegated to personal devices, generally the wife’s mobile phone, when they involve the entire family (though there is a digital divide outside the US). Without moving these experiences into a place that everyone can participate in, we will continue these inequalities.

So far, technology has been great at intermediating people for coordination through systems like text messaging, social networks, and collaborative documents. We don’t build interaction paradigms that allow for multiple people to engage at the same time in their communal spaces. To do this we need to address that the norms that dictate what is appropriate behavior are invisible and pervasive in the spaces these technologies are deployed.

Ownership

Many of these devices are not really owned by the people who buy them. As part of the current trend towards subscription-based business models, the device won’t function if you don’t subscribe to a service. Those services have license agreements that specify what you can and cannot do (which you can read if you have a few hours to spare and can understand them).

For example, this has been an issue for fans of Amazon’s Blink camera. The home automation industry is fragmented: there are many vendors, each with its own application to control their particular devices. But most people don’t want to use different apps to control their lighting, their television, their security cameras, and their locks. Therefore, people have started to build controllers that span the different ecosystems. Doing so has caused Blink users to get their accounts suspended.

What’s even worse is that these license agreements can change whenever the company wants. Licenses are frequently modified with nothing more than a notification, after which something that was previously acceptable is now forbidden. In 2020, Wink suddenly applied a monthly service charge; if you didn’t pay, the device would stop working. Also in 2020, Sonos caused a stir by saying they were going to “recycle” (disable) old devices. They eventually changed their policy.

The issue isn’t just what you can do with your devices; it’s also what happens to the data they create. Amazon’s Ring partnership with one in ten US police departments troubles many privacy groups because it creates a vast surveillance program. What if you don’t want to be a part of the police state? Make sure you check the right box and read your terms of service. If you’re designing a device, you need to require users to opt in to data sharing (especially as regions adapt GDPR and CCPA-like regulation).

While techniques like federated learning are on the horizon, to avoid latency issues and mass data collection, it remains to be seen whether those techniques are satisfactory for companies that collect data. Is there a benefit to both organizations and their customers to limit or obfuscate the transmission of data away from the device?

Ownership is particularly tricky for communal devices. This is a collision between the expectations of consumers who put something in their home; those expectations run directly against the way rent-to-use services are pitched. Until we acknowledge that hardware put in a home is different from a cloud service, we will never get it right.

Lots of problems, now what?

Now that we have dived into the various problems that rear their head with communal devices, what do we do about it? In the next article we discuss a way to consider the map of the communal space. This helps build a better understanding of how the communal device fits in the context of the space and services that exist already.

We will also provide a list of dos and don’ts for leaders, developers, and designers to consider when building a communal device.

Categories: Technology

Thinking About Glue

O'Reilly Radar - Tue, 2021/07/13 - 06:28

In Glue: the Dark Matter of Software, Marcel Weiher asks why there’s so much code. Why is Microsoft Office 400 million lines of code? Why are we always running into the truth of Alan Kay’s statement that “Software seems ‘large’ and ‘complicated’ for what it does”?

Weiher makes an interesting claim: the reason we have so much code is Glue Code, the code that connects everything together. It’s “invisible and massive”; it’s “deemed not important”; and, perhaps most important, it’s “quadratic”: the glue code is proportional to the square of the number of things you need to glue. That feels right; and in the past few years, we’ve become increasingly aware of the skyrocketing number of dependencies in any software project significantly more complex than “Hello, World!” We can all add our own examples: the classic article Hidden Technical Debt in Machine Learning Systems shows a block diagram of a system in which machine learning is a tiny block in the middle, surrounded by all sorts of infrastructure: data pipelines, resource management, configuration, etc. Object Relational Management (ORM) frameworks are a kind of glue between application software and databases. Web frameworks facilitate gluing together components of various types, along with gluing that front end to some kind of back end. The list goes on.

Weiher makes another important point: the simplest abstraction for glue is the Unix pipe (|), although he points out that pipes are not the only solution. Anyone who has used Unix or a variant (and certainly anyone who has read–or in my case, written–chunks of Unix Power Tools) realizes how powerful the pipe is. A standard way to connect tools that are designed to do one thing well: that’s important.

But there’s another side to this problem, and one that we often sweep under the rug. A pipe has two ends: something that’s sending data, and something that’s receiving it. The sender needs to send data in a format that the receiver understands, or (more likely) the receiver needs to be able to parse and interpret the sender’s data in a way that it understands. You can pipe all the log data you want into an awk script (or perl, or python), but that script is still going to have to parse that data to make it interpretable. That’s really what those millions of lines of glue code do: either format data so the receiver can understand it or parse incoming data into a usable form. (This task falls more often on the receiver than the sender, largely because the sender often doesn’t—and shouldn’t—know anything about the receiver.)

From this standpoint, the real problem with glue isn’t moving data, though the Unix pipe is a great abstraction; it’s data integration. In a discussion about blockchains and medical records, Jim Stogdill once said “the real problem has nothing to do with blockchains. The real problem is data integration.” You can put all the data you want on a blockchain, or in a data warehouse, or in a subsurface data ocean the size of one of Jupiter’s moons, and you won’t solve the problem that application A generates data in a form that application B can’t use. If you know anything about medical records (and I know very little), you know that’s the heart of the problem. One major vendor has products that aren’t even compatible with each other, let alone competitors’ systems. Not only are data formats incompatible, the meanings of fields in the data are often different in subtle ways. Chasing down those differences can easily run to hundreds of thousands, if not millions, of lines of code.

Pipes are great for moving data from one place to another. But there’s no equivalent standard for data integration. XML might play a role, but it only solves the easy part of the problem: standardizing parsing has some value, but the ease of parsing XML was always oversold, and the real problems stem more from schemas than data formats. (And please don’t play the “XML is human-readable and -writable” game.) JSON strikes me as XML for “pickling” JavaScript objects, replacing angle brackets with curly braces: a good idea that has gotten a lot of cross-language support, but like XML neglects the tough part of the problem.

Is data integration a problem that can be solved? In networking, we have standards for what data means and how to send it. All those TCP/IP packet headers that have been in use for almost 40 years (the first deployment of IPv4 was in 1982) have kept data flowing between systems built by different vendors. The fields in the header have been defined precisely, and new protocols have been built successfully at every layer of the network stack.

But this kind of standardization doesn’t solve the N squared problem. In a network stack, TCP talks to TCP; HTTPS talks to HTTPS. (Arguably, it keeps the N squared problem from being an N cubed problem.) The network stack designs the N squared problem out of existence, at least as far as the network itself is concerned, but that doesn’t help at the application layer. When we’re talking applications, a medical app needs to understand medical records, financial records, regulatory constraints, insurance records, reporting systems, and probably dozens more. Nor does standardization really solve the problem of new services. IPv4 desperately needs to be replaced (and IPv6 has been around since 1995), but IPv6 has been “5 years in the future” for two decades now. Hack on top of hack has kept IPv4 workable; but will layer and layer of hack work if we’re extending medical or financial applications?

Glue code expands as the square of the number of things that are glued. The need to glue different systems together is at the core of the problems facing software development; as systems become more all-encompassing, the need to integrate with different systems increases. The glue–which includes code written for data integration–becomes its own kind of technical debt, adding to the maintenance burden. It’s rarely (if ever) refactored or just plain removed because you always need to “maintain compatibility” with some old system.  (Remember IE6?)

Is there a solution? In the future, we’ll probably need to integrate more services.  The glue code will be more complex, since it will probably need to live in some “zero trust” framework (another issue, but an important one).  Still, knowing that you’re writing glue code, keeping track of where it is, and being proactive about removing it when it’s needed will keep the problem manageable. Designing interfaces carefully and observing standards will minimize the need for glue. In the final analysis, is glue code really a problem? Programming is ultimately about gluing things together, whether they’re microservices or programming libraries. Glue isn’t some kind of computational waste; it’s what holds our systems together.  Glue development is software development.

Categories: Technology

July 8th Virtual Meeting Topics

PLUG - Wed, 2021/07/07 - 14:45

We'll have 2 persentations for this month "MySQL 8.0 Indexes, Histograms, and Other Ways to Speed Up Your Queries" and "My Presentation Creation Stack with AsciiDoc"
Attend the meeting on by visiting: https://lufthans.bigbluemeeting.com/b/plu-yuk-7xx on the 8th of July at 7pm MST

Dave Stokes: MySQL 8.0 Indexes, Histograms, and Other Ways to Speed Up Your Queries

Description:
Improving the performance of database queries is often seen as a Harry Potter-ish dark art. In reality it is simple engineering and revolves around providing the query optimizer the best information about your data! And how do you do that? Well, you can start with properly planned indexes and histograms. We will also venture into some other areas that will help you speed up your queries.

About Dave:
Dave Stokes is a MySQL Community Manager for Oracle and the author of 'MySQL & JSON - A Practical Programming Guide'.


der.hans: My Presentation Creation Stack with AsciiDoc

Description:
Creating information rich presentations in AsciiDoc is easy.
AsciiDoc was created as a markup for making DocBook books.
Since it is plain text, AsciiDoc lends itself to writing, editing and using revision control.

The presentation introduces AsciiDoc and covers both advantages and disadvantages of using it for presentations.
I will also cover some of the presentation tools I've used previously.
Then I'll illuminate why I now prefer AsciiDoc and some suggestions when using it for presentations.

About der.hans:
der.hans is a technologist, Free Software advocate, parent and spouse.

Hans is chairman of the Phoenix Linux User Group (PLUG), chair for SeaGL Finance committee, founder of SeaGL Career Expo, BoF organizer for the Southern California Linux Expo (SCaLE) and founder of the Free Software Stammtisch. He presents regularly at large community-led conferences (SeaGL, SCaLE, LCA, FOSSASIA, Tübix, CLT, LFNW, OLF, SELF, GeekBeacon Fest) and many local groups.

Currently a Customer Data Engineer at Object Rocket. Public statements are not representative of $dayjob.

Fediverse/Mastodon - https://floss.social/@FLOX_advocate

Radar trends to watch: July 2021

O'Reilly Radar - Tue, 2021/07/06 - 10:12

Certainly the biggest news of the past month has been a continuation of the trend towards regulating the biggest players in the tech industry.  The US House of Representatives is considering 5 antitrust bills that would lead to major changes in the way the largest technology companies do business; and the Biden administration has appointed a new Chair of the Federal Trade Commission who will be inclined to use these regulations aggressively. Whether these bills pass in their current form, how they are challenged in court, and what changes they will lead to is an open question.  (Late note: Antitrust cases against Facebook by the FTC and state governments based on current law were just thrown out of court.)

Aside from that, we see AI spreading into almost every area of computing; this list could easily have a single AI heading that subsumes programming, medicine, security, and everything else.

AI and Data
  • A new algorithm allows autonomous vehicles to locate themselves using computer vision (i.e., without GPS) regardless of the season; it works even when the terrain is snow-covered.
  • An AI-based wildfire detection system has been deployed in Sonoma County. It looks for smoke plumes, and can monitor many more cameras than a human.
  • Researchers are investigating how racism and other forms of abuse enter AI models like GPT-3, and what can be done to prevent their appearance in the output. It’s essential for AI to “understand” racist content, but equally essential for it not to generate that content.
  • Google has successfully used Reinforcement Learning to design the layout for the next generation TPU chip. The layout process took 6 hours, and replaced weeks of human effort. This is an important breakthrough in the design of custom integrated circuits.
  • Facebook has developed technology to identify the source from which deepfake images originate. “Fingerprints” (distortions in the image) make it possible to identify the model that generated the images, and possibly to track down the creators.
  • Adaptive mood control is a technique that autonomous vehicles can use to detect passengers’ emotions and drive accordingly, making it easier for humans to trust the machine. We hope this doesn’t lead AVs to drive faster when the passenger is angry or frustrated.
  • IBM has developed Uncertainty Quantification 360, a set of open source tools for quantifying the uncertainty in AI systems. Understanding uncertainty is a big step towards building trustworthy AI and getting beyond the idea that the computer is always right. Trust requires understanding uncertainty.
  • Waymo’s autonomous trucks will begin carrying real cargo between Houston and Fort Worth, in a partnership with a major trucking company.
  • GPT-2 can predict brain activity and comprehension in fMRI studies of patients listening to stories, possibly indicating that in some way its processes correlate to brain function.
  • GPT-J is a language model with performance similar to GPT-3.  The code and weights are open source.
  • It appears possible to predict preferences directly by comparing brain activity to activity of others (essentially, brain-based collaborative filtering). A tool for advertising or for self-knowledge?
  • Features stores are tools to automate building pipelines to deliver data for ML applications in production. Tecton, which originated with Uber’s Michelangelo, is one of the early commercial products available.
  • How does machine learning work with language? Everything You Ever Said doesn’t answer the question, but lets you play with an NLP engine by pasting in a text, then adding or subtracting concepts to see how the text is transformed.  (Based on GLoVE, a pre-GPT model.)
  • The HateCheck dataset tests the ability of AI applications to detect hate speech correctly. Hate speech is a hard problem; being too strict causes systems to reject content that shouldn’t be classified as hate speech, while being too lax allows hate speech through.
Ethics
  • Twitter has built a data ethics group aimed at putting ethics into practice, in addition to research.  Among others, the group includes Rumman Chowdhury and Kristian Lum.
  • A study of the effect of noise on fairness in lending shows that insufficient (hence noisier) data is as big a problem as biased data. Poor people have less credit history, which means that their credit scores are often inaccurate. Correcting problems arising from noise is much more difficult than dealing with problems of bias.
  • Andrew Ng’s newsletter, The Batch, reports on a survey of executives that most companies are not practicing “responsible AI,” or even understand the issues. There is no consensus about the importance (or even the meaning) of “ethics” for AI.
  • Using AI to screen resumes is a problem in itself, but AI doing the interview? That’s taking problematic to a new level. It can be argued that AI, when done properly, is less subject to bias than a human interviewer, but we suspect that AI interviewers present more problems than solutions.
Web
  • WebGPU is a proposal for a standard API that makes GPUs directly accessible to web pages for rendering and computation.
  • An end to providing cookie consent for every site you visit?  The proposed ADPC (advanced data protection control) standard will allow users to specify privacy preferences once.
  • Using social media community guidelines as a political weapon: the Atajurt Kazakh Human Rights channel, which publishes testimonies from people imprisoned in China’s internment camps, has been taken down repeatedly as a result of coordinated campaigns.
Security
  • Microsoft is working on eliminating passwords! Other companies should take the hint. Microsoft is stressing biometrics (which have their own problems) and multi-factor authentication.
  • Supply chain security is very problematic.  Microsoft admits to an error in which they mistakenly signed a device driver that was actually a rootkit, causing security software to ignore it. The malware somehow slipped through Microsoft’s signing process.
  • Markpainting is a technology for defeating attempts to create a fake image by adding elements to the picture that aren’t visible, but that will become visible when the image is modified (for example, to eliminate a watermark).
  • Amazon Sidewalk lets Amazon devices connect to other open WiFi nets to extend their range and tap others’ internet connections. Sidewalk is a cool take on decentralized networking. It is also a Very Bad Idea.
  • Authentication using gestures, hand shapes, and geometric deep learning? I’m not convinced, but this could be a viable alternative to passwords and crude biometrics. It would have to work for people of all skin colors, and that has consistently been a problem for vision-based products.
  • According to Google, Rowhammer attacks are gaining momentum–and will certainly gain even more momentum as feature sizes in memory chips get smaller. Rowhammer attacks repeatedly access a single row in a memory chip, hoping to corrupt adjacent bits.
  • While details are sketchy, the FBI was able to recover the BTC Colonial Pipeline paid to Darkside to restore systems after their ransomware attack. The FBI has been careful to say that they can’t promise recovering payments in other cases. Whether this recovery reflects poor opsec on the part of the criminals, or that Bitcoin is more easily de-anonymized than most people think, it’s clear that secrecy and privacy are relative.
Design and User Experience
  • Communal Computing is about designing devices that are inherently shared: home assistants, home automation, and more. The “single account/user” model doesn’t work.
  • A microphone that only “hears” frequencies above the human hearing range can be used to detect human activities (for example, in a smart home device) without recording speech.
  • Digital Twins in aerospace at scale: One problem with the adoption of digital twins is that the twin is very specific to a single device. This research shows that it’s possible to model real-world objects in ways that can be reused across collections of objects and different applications.
Medicine
  • The Open Insulin Foundation is dedicated to creating the tools necessary to produce insulin at scale. This is the next step in a long-term project by Anthony DiFranco and others to challenge the pharma company’s monopoly on insulin production, and create products at a small fraction of the price.
  • Where’s the work on antivirals and other treatments for COVID-19? The answer is simple: Vaccines are very profitable. Antivirals aren’t. This is a huge, institutional problem in the pharmaceutical industry.
  • The National Covid Cohort Collaborative (N3C) is a nationwide database of anonymized medical records of COVID patients. What’s significant isn’t COVID, but that N3C is a single database, built to comply with privacy laws, that’s auditable, and that’s open for any group to make research proposals.
  • Can medical trials be sped up by re-using control data (data from patients who were in the control group) from previous trials? Particularly for rare and life-threatening diseases, getting trial volunteers is difficult because nobody wants to be assigned to the control group.
  • A remote monitoring patch for COVID patients uses AI to understand changes in the patient’s vital signs, allowing medical staff to intervene immediately if a patient’s condition worsens. Unlike most such devices, it was trained primarily on Black and Hispanic patients.
  • Machine learning in medicine is undergoing a credibility crisis: poor data sets with limited diversity lead to poor results.
Programming
  • Microsoft, OpenAI, and GitHub have announced a new service called Copilot that uses AI to make suggestions to programmers as they are writing code (currently in “technical preview”).  It is truly a cybernetic pair programmer.
  • Windows 11 will run Android apps. If nothing else, this is a surprise. Android apps will be provided via the Amazon store, not Google Play.
  • Microsoft’s PowerFx is a low-code programming language based on Excel formulas (which now include lambdas).  Input and output are through what looks like a web page. What does it mean to strip Excel from its 2D grid? Is this a step forward or backward for low code computing?
  • Open Source Insights is a Google project for investigating the dependency chain of any open source project. Its ability currently is limited to a few major packaging systems (including npm, Cargo, and maven), but it will be expanded.
  • Quantum computing’s first application will be in researching quantum mechanics: understanding the chemistry of batteries, drugs, and materials. In these applications, noise is an asset, not a problem.
Categories: Technology

Hand Labeling Considered Harmful

O'Reilly Radar - Wed, 2021/06/23 - 05:34

We are traveling through the era of Software 2.0, in which the key components of modern software are increasingly determined by the parameters of machine learning models, rather than hard-coded in the language of for loops and if-else statements. There are serious challenges with such software and models, including the data they’re trained on, how they’re developed, how they’re deployed, and their impact on stakeholders. These challenges commonly result in both algorithmic bias and lack of model interpretability and explainability.

There’s another critical issue, which is in some ways upstream to the challenges of bias and explainability: while we seem to be living in the future with the creation of machine learning and deep learning models, we are still living in the Dark Ages with respect to the curation and labeling of our training data: the vast majority of labeling is still done by hand.

There are significant issues with hand labeling data:

  • It introduces bias, and hand labels are neither interpretable nor explainable.
  • There are prohibitive costs to hand labeling datasets (both financial costs and the time of subject matter experts).
  • There is no such thing as gold labels: even the most well-known hand labeled datasets have label error rates of at least 5% (ImageNet has a label error rate of 5.8%!).

We are living through an era in which we get to decide how human and machine intelligence interact to build intelligent software to tackle many of the world’s toughest challenges. Labeling data is a fundamental part of human-mediated machine intelligence, and hand labeling is not only the most naive approach but also one of the most expensive (in many senses) and most dangerous ways of bringing humans in the loop. Moreover, it’s just not necessary as many alternatives are seeing increasing adoption. These include:

  • Semi-supervised learning
  • Weak supervision
  • Transfer learning
  • Active learning
  • Synthetic data generation

These techniques are part of a broader movement known as Machine Teaching, a core tenet of which is getting both humans and machines each doing what they do best. We need to use expertise efficiently: the financial cost and time taken for experts to hand-label every data point can break projects, such as diagnostic imaging involving life-threatening conditions and security and defense-related satellite imagery analysis. Hand labeling in the age of these other technologies is akin to scribes hand-copying books post-Gutenberg.

There is also a burgeoning landscape of companies building products around these technologies, such as Watchful (weak supervision and active learning; disclaimer: one of the authors is CEO of Watchful), Snorkel (weak supervision), Prodigy (active learning), Parallel Domain (synthetic data), and AI Reverie (synthetic data).

Hand Labels and Algorithmic Bias

As Deb Raji, a Fellow at the Mozilla Foundation, has pointed out, algorithmic bias “can start anywhere in the system—pre-processing, post-processing, with task design, with modeling choices, etc.,” and the labeling of data is a crucial point at which bias can creep in.


Figure 1: Bias can start anywhere in the system. Image adapted from A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle by Harini Suresh and John Guttag.

High-profile cases of bias in training data resulting in harmful models include an Amazon recruiting tool that “penalized resumes that included the word ‘women’s,’ as in ‘women’s chess club captain.’” Don’t take our word for it. Play the educational game Survival of the Best Fit where you’re a CEO who uses a machine learning model to scale their hiring decisions and see how the model replicates the bias inherent in the training data. This point is key: as humans, we possess all types of biases, some harmful, others not so. When we feed hand labeled data to a machine learning model, it will detect those patterns and replicate them at scale. This is why David Donoho astutely observed that perhaps we should call ML models recycled intelligence rather than artificial intelligence. Of course, given the amount of bias in hand labeled data, it may be more apt to refer to it as recycled stupidity (hat tip to artificial stupidity).

The only way to interrogate the reasons for underlying bias arising from hand labels is to ask the labelers themselves their rationales for the labels in question, which is impractical, if not impossible, in the majority of cases: there are rarely records of who did the labeling, it is often outsourced via at-scale global APIs, such as Amazon’s Mechanical Turk and, when labels are created in-house, previous labelers are often no longer part of the organization.

Uninterpretable, Unexplainable

This leads to another key point: the lack of both interpretability and explainability in models built on hand labeled data. These are related concepts, and broadly speaking, interpretability is about correlation, whereas explainability is about causation. The former involves thinking about which features are correlated with the output variable, while the latter is concerned with why certain features lead to particular labels and predictions. We want models that give us results we can explain and some notion of how or why they work. For example, in the ProPublica exposé of COMPAS recidivism risk model, which made more false predictions that Black people would re-offend than it did for white people, it is essential to understand why the model is making the predictions it does. Lack of explainability and transparency were key ingredients of all the deployed-at-scale algorithms identified by Cathy O’Neil in Weapons of Math Destruction.

It may be counterintuitive that getting machines more in-the-loop for labeling can result in more explainable models but consider several examples:

  • There is a growing area of weak supervision, in which SMEs specify heuristics that the system then uses to make inferences about unlabeled data, the system calculates some potential labels, and then the SME evaluates the labels to determine where more heuristics might need to be added or tweaked. For example, when building a model of whether surgery was necessary based on medical transcripts, the SME may provide the following heuristic: if the transcription contains the term “anaesthesia” (or a regular expression similar to it), then surgery likely occurred (check out Russell Jurney’s “Hand labeling is the past” article for more on this).
  • In diagnostic imaging, we need to start cracking open the neural nets (such as CNNs and transformers)! SMEs could once again use heuristics to specify that tumors smaller than a certain size and/or of a particular shape are benign or malignant and, through such heuristics, we could drill down into different layers of the neural network to see what representations are learned where.
  • When your knowledge (via labels) is encoded in heuristics and functions, as above, this also has profound implications for models in production. When data drift inevitably occurs, you can return to the heuristics encoded in functions and edit them, instead of continually incurring the costs of hand labeling.
On Auditing

Amidst the increasing concern about model transparency, we are seeing calls for algorithmic auditing. Audits will play a key role in determining how algorithms are regulated and which ones are safe for deployment. One of the barriers to auditing is that high-performing models, such as deep learning models, are notoriously difficult to explain and reason about. There are several ways to probe this at the model level (such as SHAP and LIME), but that only tells part of the story. As we have seen, a major cause of algorithmic bias is that the data used to train it is biased or insufficient in some way.

There currently aren’t many ways to probe for bias or insufficiency at the data level. For example, the only way to explain hand labels in training data is to talk to the people who labeled it. Active learning, on the other hand, allows for the principled creation of smaller datasets which have been intelligently sampled to maximize utility for a model, which in turn reduces the overall auditable surface area. An example of active learning would be the following: instead of hand labeling every data point, the SME can label a representative subset of the data, which the system uses to make inferences about the unlabeled data. Then the system will ask the SME to label some of the unlabeled data, cross-check its own inferences and refine them based on the SME’s labels. This is an iterative process that terminates once the system reaches a target accuracy. Less data means less headache with respect to auditability.

Weak supervision more directly encodes expertise (and hence bias) as heuristics and functions, making it easier to evaluate where labeling went awry. For more opaque methods, such as synthetic data generation, it might be a bit difficult to interpret why a particular label was applied, which may actually complicate an audit. The methods we choose at this stage of the pipeline are important if we want to make sure the system as a whole is explainable.

The Prohibitive Costs of Hand Labeling

There are significant and differing forms of costs associated with hand labeling. Giant industries have been erected to deal with the demand for data-labeling services. Look no further than Amazon Mechanical Turk and all other cloud providers today. It is telling that data labeling is becoming increasingly outsourced globally, as detailed by Mary Gray in Ghost Work, and there are increasingly serious concerns about the labor conditions under which hand labelers work around the globe.

The sheer amount of capital involved was evidenced by Scale AI raising $100 million in 2019 to bring their valuation to over $1 billion at a time when their business model solely revolved around using contractors to hand label data (it is telling that they’re now doing more than solely hand labels).

Money isn’t the only cost, and quite often, isn’t where the bottleneck or rate-limiting step occurs. Rather, it is the bandwidth and time of experts that is the scarcest resource. As a scarce resource, this is often expensive but, much of the time it isn’t even available (on top of this, the time it also takes to correct errors in labeling by data scientists is very expensive). Take financial services, for example, and the question of whether or not you should invest in a company based on information about the company scraped from various sources. In such a firm, there will only be a small handful of people who can make such a call, so labeling each data point would be incredibly expensive, and that’s if the SME even has the time.

This is not vertical-specific. The same challenge occurs in labeling legal texts for classification: is this clause talking about indemnification or not? And in medical diagnosis: is this tumor benign or malignant? As dependence on expertise increases, so does the likelihood that limited access to SMEs becomes a bottleneck.

The third cost is a cost to accuracy, reality, and ground truth: the fact that hand labels are often so wrong. The authors of a recent study from MIT identified “label errors in the test sets of 10 of the most commonly-used computer vision, natural language, and audio datasets.” They estimated an average error rate of 3.4% across the datasets and show that ML model performance increases significantly once labels are corrected, in some instances. Also, consider that in many cases ground truth isn’t easy to find, if it exists at all. Weak supervision makes room for these cases (which are the majority) by assigning probabilistic labels without relying on ground truth annotations. It’s time to think statistically and probabilistically about our labels. There is good work happening here, such as Aka et al.’s (Google) recent paper Measuring Model Biases in the Absence of Ground Truth.

The costs identified above are not one-off. When you train a model, you have to assume you’re going to train it again if it lives in production. Depending on the use case, that could be frequent. If you’re labeling by hand, it’s not just a large upfront cost to build a model. It is a set of ongoing costs each and every time.


Figure 2: There are no “gold labels”: even the most well-known hand labeled datasets have label error rates of at least 5% (ImageNet has a label error rate of 5.8%!). The Efficacy of Automation Techniques

In terms of performance, even if getting machines to label much of your data results in slightly noisier labels, your models are often better off with 10 times as many slightly noisier labels. To dive a bit deeper into this, there are gains to be made by increasing training set size even if it means reducing overall label accuracy, but if you’re training classical ML models, only up to a point (past this point the model starts to see a dip in predictive accuracy). “Scaling to Very Very Large Corpora for Natural Language Disambiguation (Banko & Brill, 2001)” demonstrates this in a traditional ML setting by exploring the relationship between hand labeled data, automatically labeled data, and subsequent model performance. A more recent paper, “Deep Learning Scaling Is Predictable, Empirically (2017)”, explores the quantity/quality relationship relative to modern state of the art model architectures, illustrating the fact that SOTA architectures are data hungry, and accuracy improves as a power law as training sets grow:

We empirically validate that DL model accuracy improves as a power-law as we grow training sets for state-of-the-art (SOTA) model architectures in four machine learning domains: machine translation, language modeling, image processing, and speech recognition. These power-law learning curves exist across all tested domains, model architectures, optimizers, and loss functions.

The key question isn’t “should I hand label my training data or should I label it programmatically?” It should instead be “which parts of my data should I hand label and which parts should I label programmatically?” According to these papers, by introducing expensive hand labels sparingly into largely programmatically generated datasets, you can maximize the effort/model accuracy tradeoff on SOTA architectures that wouldn’t be possible if you had hand labeled alone.

The stacked costs of hand labeling wouldn’t be so challenging were they necessary, but the fact of the matter is that there are so many other interesting ways to get human knowledge into models. There’s still an open question around where and how we want humans in the loop and what’s the right design for these systems. Areas such as weak supervision, self-supervised learning, synthetic data generation, and active learning, for example, along with the products that implement them, provide promising avenues for avoiding the pitfalls of hand labeling. Humans belong in the loop at the labeling stage, but so do machines. In short, it’s time to move beyond hand labels.

Many thanks to Daeil Kim for feedback on a draft of this essay.

Categories: Technology

Two economies. Two sets of rules.

O'Reilly Radar - Tue, 2021/06/22 - 06:07

At one point early this year, Elon Musk briefly became the richest person in the world. After a 750% increase in Tesla’s stock market value added over $180 billion to his fortune, he briefly had a net worth of over $200 billion. It’s now back down to “only” $155 billion.

Understanding how our economy produced a result like this—what is good about it and what is dangerous—is crucial to any effort to address the wild inequality that threatens to tear our society apart.

The betting economy versus the operating economy

In response to the news of Musk’s surging fortune, Bernie Sanders tweeted:

Bernie was right that a $7.25 minimum wage is an outrage to human decency. If the minimum wage had kept up with increases in productivity since 1979, it would be over $24 by now, putting a two-worker family into the middle class. But Bernie was wrong to imply that Musk’s wealth increase was at the expense of Tesla’s workers. The median Tesla worker makes considerably more than the median American worker.

Elon Musk’s wealth doesn’t come from him hoarding Tesla’s extractive profits, like a robber baron of old. For most of its existence, Tesla had no profits at all. It became profitable only last year. But even in 2020, Tesla’s profits of $721 million on $31.5 billion in revenue were small—only slightly more than 2% of sales, a bit less than those of the average grocery chain, the least profitable major industry segment in America.

No, Musk won the lottery, or more precisely, the stock market beauty contest. In theory, the price of a stock reflects a company’s value as an ongoing source of profit and cash flow. In practice, it is subject to wild booms and busts that are unrelated to the underlying economics of the businesses that shares of stock are meant to represent.

Why is Musk so rich? The answer tells us something profound about our economy: he is wealthy because people are betting on him. But unlike a bet in a lottery or at a racetrack, in the vast betting economy of the stock market, people can cash out their winnings before the race has ended.

This is one of the biggest unacknowledged drivers of inequality in America, the reason why one segment of our society prospered so much during the pandemic while the other languished.

What are the odds?

If the stock market is like a horse race where people can cash out their bets while the race is still being run, what does it mean for the race to finish? For an entrepreneur or an early-stage investor, an IPO is a kind of finish, the point where they can sell previously illiquid shares on to others. An acquisition or a shutdown, either of which puts an end to a company’s independent existence, is another kind of ending. But it is also useful to think of the end of the race as the point in time at which the stream of company profits will have repaid the investment.

Since ownership of public companies is spread across tens of thousands of people and institutions, it’s easier to understand this point by imagining a small private company with one owner, say, a home construction business or a storage facility or a car wash. If it cost $1 million to buy the business, and it delivered $100,000 of profit a year, the investment would be repaid in 10 years. If it delivered $50,000 in profit, it would take 20. And of course, those future earnings would need to be discounted at some rate, since a dollar received 20 years from now is not worth as much as a dollar received today. This same approach works, in theory, for large public companies. Each share is a claim on a fractional share of the company’s future profits and the present value that people put on that profit stream.

This is, of course, a radical oversimplification. There are many more sophisticated ways to value companies, their assets, and their prospects for future streams of profits. But what I’ve described above is one of the oldest, the easiest to understand, and the most clarifying. It is called the price/earnings ratio, or simply the P/E ratio. It’s the ratio between the price of a single share of stock and the company’s earnings per share (its profits divided by the number of shares outstanding.) What the P/E ratio gives, in effect, is a measure of how many years of current profits it would take to pay back the investment.

The rate of growth also plays a role in a company’s valuation. For example, imagine a business with $100 million in revenue with a 10% profit margin, earning $10 million a year. How much it is worth to own that asset depends how fast it is growing and what stage of its lifecycle it is in when you bought it. If you were lucky enough to own that business when it had only $1 million in revenue and, say, $50,000 in profits, you would now be earning 200x as much as you were when you made your original investment. If a company grows to hundreds of billions in revenue and tens of billions in profits, as Apple, Microsoft, Facebook, and Google have done, even a small investment early on that is held for the long haul can make its lucky owner into a billionaire. Tesla might be one of these companies, but if so, the opportunity to buy its future is long past because it is already so highly valued. The P/E ratio helps you to understand the magnitude of the bet you are making at today’s prices.

The average P/E ratio of the S&P 500 has varied over time as “the market” (the aggregate opinion of all investors) goes from bullish about the future to bearish, either about specific stocks or about the market as a whole. Over the past 70 years, the ratio has ranged from a low of 7.22 in 1950 to almost 45 today. (A note of warning: it was only 17 on the eve of the Great Depression.)

What today’s P/E ratio of 44.8 means that, on average, the 500 companies that make up the S&P 500 are valued at about 45 years’ worth of present earnings. Most companies in the index are worth less, and some far more. In today’s overheated market, it is often the case that the more certain the outcome the less valuable a company is considered to be. For example, despite their enormous profits and huge cash hoards, Apple, Google, and Facebook have ratios much lower than you might expect: about 30 for Apple, 34 for Google, and 28 for Facebook. Tesla at the moment of Elon Musk’s peak wealth? 1,396.

Let that sink in. You’d have had to wait almost 1,400 years to get your money back if you’d bought Tesla stock this past January and simply relied on taking home a share of its profits. Tesla’s more recent quarterly earnings are a bit higher, and its stock price quite a bit lower, so now you’d only have to wait about 600 years.

Of course, it’s certainly possible that Tesla will so dominate the auto industry and related energy opportunities that its revenues could grow from its current $28 billion to hundreds of billions with a proportional increase in profits. But as Rob Arnott, Lillian Wu, and Bradford Cornell point out in their analysis “Big Market Delusion: Electric Vehicles,” electric vehicle companies are already valued at roughly the same amount as the entire rest of the auto industry despite their small revenues and profits and despite the likelihood of more, rather than less, competition in future. Barring some revolution in the fundamental economics of the business, current investors are likely paying now for the equivalent of hundreds of years of future profits.

So why do investors do this? Simply put: because they believe that they will be able to sell their shares to someone else at an even higher price. In times where betting predominates in financial markets, what a company is actually worth by any intrinsic measure seems to have no more meaning than the actual value of tulips during the 17th century Dutch “tulip mania.” As the history of such moments teaches, eventually the bubble does pop.

This betting economy, within reason, is a good thing. Speculative investment in the future gives us new products and services, new drugs, new foods, more efficiency and productivity, and a rising standard of living. Tesla has kickstarted a new gold rush in renewable energy, and given the climate crisis, that is vitally important. A betting fever can be a useful collective fiction, like money itself (the value ascribed to pieces of paper issued by governments) or the wild enthusiasm that led to the buildout of railroads, steel mills, or the internet. As economist Carlota Perez has noted, bubbles are a natural part of the cycle by which revolutionary new technologies are adopted.

Sometimes, though, the betting system goes off the rails. Tesla’s payback may take centuries, but it is the forerunner of a necessary industrial transformation. But what about the payback on companies such as WeWork? How about Clubhouse? Silicon Valley is awash in companies that have persuaded investors to value them at billions despite no profits, no working business model, and no pathway to profitability. Their destiny, like WeWork’s or Katerra’s, is to go bankrupt.

John Maynard Keynes, the economist whose idea that it was essential to invest in the demand side of the economy and not just the supply side helped bring the world out of the Great Depression, wrote in his General Theory of Employment, Interest and Money, “Speculators may do no harm as bubbles on a steady stream of enterprise. But the position is serious when enterprise becomes the bubble on a whirlpool of speculation. When the capital development of a country becomes a by-product of the activities of a casino, the job is likely to be ill-done.”

In recent decades, we have seen the entire economy lurch from one whirlpool of speculation to another. And as at the gambling table, each lurch represents a tremendous transfer of wealth from the losers to the winners. The dot-com bust. The subprime mortgage meltdown. Today’s Silicon Valley “unicorn” bubble. The failures to deliver on their promises by WeWork, Katerra, and their like are just the start of yet another bubble popping.

Why this matters

Those at the gaming table can, for the most part, afford to lose. They are disproportionately wealthy. Nearly 52% of stock market value is held by the top 1% of Americans, with another 35% of total market value held by the next 9%. The bottom 50% hold only 0.7% of stock market wealth.

Bubbles, though, are only an extreme example of a set of dynamics that shape our economy far more widely than we commonly understand. The leverage provided by the betting economy drives us inevitably toward a monoculture of big companies. The local bookstore trying to compete with Amazon, the local cab company competing with Uber, the neighborhood dry cleaner, shopkeeper, accountant, fitness studio owner, or any other local, privately held business gets exactly $1 for every dollar of profit it earns. Meanwhile, a dollar of Tesla profit turns into $600 of stock market value; a dollar of Amazon profit turns into $67 of stock market value; a dollar of Google profit turns into $34, and so on. A company and its owners can extract massive amounts of value despite having no profits—value that can be withdrawn by those who own shares—essentially getting something for nothing.

And that, it turns out, is also one underappreciated reason why in the modern economy, the rich get richer and the poor get poorer. Rich and poor are actually living in two different economies, which operate by different rules. Most ordinary people live in a world where a dollar is a dollar. Most rich people live in a world of what financial pundit Jerry Goodman, writing under the pseudonym Adam Smith, called “supermoney,” where assets have been “financialized” (that is, able to participate in the betting economy) and are valued today as if they were already delivering the decades worth of future earnings that are reflected in their stock price.

Whether you are an hourly worker or a small business owner, you live in the dollar economy. If you’re a Wall Street investor, an executive at a public company compensated with stock grants or options, a venture capitalist, or an entrepreneur lucky enough to win, place, or show in the financial market horse race, you live in the supermoney economy. You get a huge interest-free loan from the future.

Elon Musk has built not one but two world-changing companies (Tesla and SpaceX.) He clearly deserves to be wealthy. As does Jeff Bezos, who quickly regained his title as the world’s wealthiest person. Bill Gates, Steve Jobs, Larry Page and Sergey Brin, Mark Zuckerberg, and many other billionaires changed our world and have been paid handsomely for it.

But how much is too much? When Bernie Sanders said that billionaires shouldn’t exist, Mark Zuckerberg agreed, saying, “On some level, no one deserves to have that much money.” He added, “I think if you do something that’s good, you get rewarded. But I do think some of the wealth that can be accumulated is unreasonable.” Silicon Valley was founded by individuals for whom hundreds of millions provided plenty of incentive! The notion that entrepreneurs will stop innovating if they aren’t rewarded with billions is a pernicious fantasy.

What to do about it

Taxing the rich and redistributing the proceeds might seem like it would solve the problem. After all, during the 1950s, ’60s, and ’70s, progressive income tax rates as high as 90% did a good job of redistributing wealth and creating a broad-based middle class. But we also need to put a brake on the betting economy that is creating so much phantom wealth by essentially letting one segment of society borrow from the future while another is stuck in an increasingly impoverished present.

Until we recognize the systemic role that supermoney plays in our economy, we will never make much of a dent in inequality. Simply raising taxes is a bit like sending out firefighters with hoses spraying water while another team is spraying gasoline.

The problem is that government policy is biased in favor of supermoney. The mandate for central bankers around the world is to keep growth rates up without triggering inflation. Since the 2009 financial crisis, they have tried to do this by “quantitative easing,” that is, flooding the world with money created out of nothing. This has kept interest rates low, which in theory should have sparked investment in the operating economy, funding jobs, factories, and infrastructure. But far too much of it went instead to the betting economy.

Stock markets have become so central to our imagined view of how the economy is doing that keeping stock prices going up even when companies are overvalued has become a central political talking point. Any government official whose policies cause the stock market to go down is considered to have failed. This leads to poor public policy as well as poor investment decisions by companies and individuals.

As Steven Pearlstein, Washington Post columnist and author of the book Moral Capitalism, put it in a 2020 column:

When the markets are buoyant, Fed officials claim that central bankers should never second-guess markets by declaring that there are financial bubbles that might need to be deflated. Markets on their own, they assure, will correct whatever excesses may develop.

But when bubbles burst or markets spiral downward, the Fed suddenly comes around to the idea that markets aren’t so rational and self-correcting and that it is the Fed’s job to second-guess them by lending copiously when nobody else will.

In essence, the Fed has adopted a strategy that works like a one-way ratchet, providing a floor for stock and bond prices but never a ceiling.

That’s the fire hose spraying gasoline. To turn it off, central banks should:

  • Raise interest rates, modestly at first, and more aggressively over time. Yes, this would quite possibly puncture the stock market bubble, but that could well be a good thing. If people can no longer make fortunes simply by betting that stocks will go up and instead have to make more reasonable assessments of the underlying value of their investments, the market will become better at allocating capital.
  • Alternatively, accept much larger increases in inflation. As Thomas Piketty explained in Capital in the Twenty-First Century, inflation is one of the prime forces that decreases inequality, reducing the value of existing assets and more importantly for the poor, reducing the value of debt and the payments paid to service it.
  • Target small business creation, hiring, and profitability in the operating economy rather than phantom valuation increases for stocks.

Tax policy also fans the fire. Taxes shape the economy in much the same way as Facebook’s algorithms shape its news feed. The debate about whether taxes as a whole should be higher or lower completely lacks nuance and so misses the point, especially in the US, where elites use their financial and political power to get favored treatment. Here are some ideas:

In general, we should treat not just illegal evasion but tax loopholes the way software companies treat zero-day exploits, as something to be fixed as soon as they are recognized, not years or decades later. Even better, stop building them into the system in the first place! Most loopholes are backdoors installed knowingly by our representatives on behalf of their benefactors.

This last idea is perhaps the most radical. The tax system could and should become more dynamic rather than more predictable. Imagine if Facebook or Google were to tell us that they couldn’t change their algorithms to address misinformation or spam without upsetting their market and so had to leave abuses in place for decades in the interest of maintaining stability—we’d think they were shirking their duty. So too our policy makers. It’s high time we all recognize the market-shaping role of tax and monetary policy. If we can hold Facebook’s algorithms to account, why can’t we do the same for our government?

Our society and markets are getting the results the algorithm was designed for. Are they the results we actually want?

Categories: Technology

Communal Computing

O'Reilly Radar - Tue, 2021/06/15 - 04:27

Home assistants and smart displays are being sold in record numbers, but they are built wrong. They are designed with one person in mind: the owner. These technologies need to fit into the communal spaces where they are placed, like homes and offices. If they don’t fit, they will be unplugged and put away due to lack of trust.

The problems are subtle at first. Your Spotify playlist starts to have recommendations for songs you don’t like. You might see a photo you took on someone else’s digital frame. An Apple TV reminds you of a new episode of a show your partner watches. Guests are asking you to turn on your IoT-enabled lights for them. The wrong person’s name shows up in the Zoom call. Reminders for medication aren’t heard by the person taking the medication. Bank account balances are announced during a gathering of friends.

Would you want your bank account balances announced during a dinner party?

This is the start of a series discussing the design of communal devices–devices designed to work in communal spaces. The series is a call to action for everyone developing communal devices–whether you are creating business cases, designing experiences, or building technology–to take a step back and consider what is really needed.

This first article discusses what communal devices are, and how problems that appear result from our assumptions about how they’re used. Those assumptions were inherited from the world of PCs: the rules that apply to your laptop or your iPad just don’t apply to home assistants and other “smart devices,” from light bulbs to refrigerators.  It isn’t just adding the ability for people to switch accounts. We need a new paradigm for the future of technical infrastructure for our homes and offices. In this series of articles we will tell you how we got here, why it is problematic, and where to go to enable communal computing.

The Wrong Model

Problems with communal devices arise because the industry has focused on a specific model for how these devices are used: a single person buys, sets up, and uses the device. If you bought one of these devices (for example, a smart speaker) recently, how many other people in your household did you involve in setting it up?

Smart screen makers like Amazon and Google continue to make small changes to try to fix the weirdness. They have recently added technology to automatically personalize based on someone’s face or voice. These are temporary fixes that will only be effective until the next special case reveals itself. Until the industry realizes the communal nature of users’ needs they will just be short lived patches. We need to turn the model around to make the devices communal first, rather than communal as an afterthought.

I recently left Facebook Reality Labs, where I was working on the Facebook Portal identity platform, and realized that there was zero discourse about this problem in the wider world of technology. I’ve read through many articles on how to create Alexa skills and attended talks about the use of IoT, and I’ve even made my own voice skills. There was no discussion of the communal impacts of those technologies. If we don’t address the problems this creates, these devices will be relegated to a small number of uses, or unplugged to make room for the next one. The problems were there, just beneath the shiny veneer of new technologies.

Communal began at home

Our home infrastructure was originally communal. Consider a bookcase: someone may have bought it, but anyone in the household could update it with new books or tchotchkes. Guests could walk up to browse the books you had there. It was meant to be shared with the house and those that had access to it.

The old landline in your kitchen is the original communal device.

Same for the old landline that was in the kitchen. When you called, you were calling a household. You didn’t know specifically who would pick up. Anyone who was part of that household could answer. We had protocols for getting the phone from the person who answered the call to the intended recipient. Whoever answered could either yell for someone to pick up the phone elsewhere in the home, or take a message. If the person answering the phone wasn’t a member of the household, it would be odd, and you’d immediately think “wrong number.”

It wasn’t until we had the user model for mainframe time sharing that we started to consider who was using a computer. This evolved into full login systems with passwords, password reset, two factor authentication, biometric authentication, and more. As computers became more common,  what made sense inside of research and academic institutions was repurposed for the office.

In the 1980s and 1990s a lot of homes got their first personal computer. These were shared, communal devices, though more by neglect than by intention. A parent would purchase it and then set it up in the living room so everyone could use it. The account switching model wasn’t added until visual systems like Windows arrived, but account management was poorly designed and rarely used. Everyone just piggybacked on each other’s access. If anyone wanted privacy, they had to lock folders with a password or hide them in an endless hierarchy.

Early Attempts at Communal Computing

Xerox-PARC started to think about what more communal or ubiquitous computing would mean. However, they focused on fast account switching. They were answering the question: how could I get the personal context to this communal device as fast as possible? One project was digitizing the whiteboard, a fundamentally communal device. It was called The Colab and offered a way for anyone to capture content in a meeting room and then walk it around the office to other shared boards.

Not only did the researchers at PARC think about sharing computers for presentations, they also wondered how they could have someone walk up to a computer and have it be configured for them automatically. It was enabled by special cards called “Active Badges,” described in “A New Location Technique for the Active Office.” The paper starts with an important realization:

“…researchers have begun to examine computers that would autonomously change their functionality based on observations of who or what was around them. By determining their context, using input from sensor systems distributed throughout the environment, computing devices could personalize themselves to their current user, adapt their behaviour according to their location, or react to their surroundings.”

Understanding the context around the device is very important in building a system that adapts. At this point, however, researchers were still thinking about a ‘current user’ and their position relative to the system, rather than the many people who could be nearby.

Even Bill Gates had communal technology in his futuristic home back then. He would give every guest a pin to put on their person that would allow them to personalize the lighting, temperature, and music as they went from room to room. Most of these technologies didn’t go anywhere, but they were an attempt at making the infrastructure around us adapt to the people who were in the space.  The term “ubiquitous computing” (also known as “pervasive computing”) was coined to discuss the installation of sensors around a space; the ideas behind ubiquitous computing later led to the Internet of Things (IoT).

Communal Computing Comes Home

When the late 2000s rolled around, we found that everyone wanted their own personal computing device, most likely an iPhone. Shared home PCs started to die. The prevalence of smartphones and personal laptops killed the need for shared home PCs. The drive goal to provide information and communication services conveniently wherever the users happened to be, including if they’re sitting together on their couches.

When the Amazon Echo with Alexa was released, they were sold to individuals with Amazon accounts, but they were clearly communal devices. Anyone could ask their Echo a question, and it would answer. That’s where the problem starts.  Although Echo is a communal device, its user model wasn’t significantly different than the early PCs: one account, one user, shared by everyone in the household. As a result, items being mistakenly ordered by children made Amazon pull back some features that were focused on shopping. Echo’s usage ended up being driven by music and weather.

With the wild success of the Echo and the proliferation of Alexa-enabled devices, there appeared a new device market for home assistants, some just for audio and others with screens. Products from Apple (HomePod with Siri), Google (Home Hub), and Facebook (Portal) followed. This includes less interactive devices like digital picture frames from Nixplay, Skylight, and others.

Ambient Computing

Ambient computing” is a term that has been coined to talk about digital devices blending into the infrastructure of the environment. A recent paper by Map Project Office focused on how “ambient tech brings the outside world into your home in new ways, where information isn’t being channelled solely through your smartphone but rather a series of devices.” We take a step back from screens and wonder how the system itself is the environment.

The concept of ambient computing is related to the focus of marketing organizations on omnichannel experiences. Omnichannel is the fact that people don’t want to start and end experiences on the same device. I might start looking for travel on a smartphone but will not feel comfortable booking a trip until I’m on a laptop. There is different information and experience needed for these devices. When I worked at KAYAK, some people were afraid of buying $1,000 plane tickets on a mobile device, even though they found it there. The small screen made them feel uncomfortable because they didn’t have enough information to make a decision. We found that they wanted to finalize the plans on the desktop.

Ambient computing takes this concept and combines voice-controlled interfaces with sensor interfaces–for example, in devices like automatic shades that close or open based on the temperature. These devices are finding traction, but we can’t forget all of the other communal experiences that already exist in the world:

Device or objectWhy is this communal?Home automation and IoT like light bulbs and thermostats Anyone with home access can use controls on device, home assistants, or personal appsiRobot’s RoombaPeople walking by can start or stop a cleaning through the ‘clean’ or ‘home’ buttonsVideo displays in office meeting roomsEmployees and guests can use the screens for sharing their laptops and video conferencing systems for callingDigital whiteboardsAnyone with access can walk up and start writingTicketing machines for public transportAll commuters buy and refill stored value cards without logging into an accountCar center screens for entertainmentDrivers (owners or borrowers) and passengers can change what they are listening toSmartphone when two people are watching a videoAnyone in arm’s reach can pause playbackGroup chat on Slack or DiscordPeople are exchanging information and ideas in a way that is seen by everyone Even public transportation ticketing machines are communal devices.

All of these have built experience models that need a specific, personal context and rarely consider everyone who could have access to them. To rethink the way that we build these communal devices, it is important that we understand this history and refocus the design on key problems that are not yet solved for communal devices.

Problems with single user devices in the home

After buying a communal device, people notice weirdness or annoyances. They are symptoms of something much larger: core problems and key questions that should have considered the role of communities rather than individuals. Here are some of those questions:

  1. Identity: do we know all of the people who are using the device?
  2. Privacy: are we exposing (or hiding) the right content for all of the people with access?
  3. Security: are we allowing all of the people using the device to do or see what they should and are we protecting the content from people that shouldn’t?
  4. Experience: what is the contextually appropriate display or next action?
  5. Ownership: who owns all of the data and services attached to the device that multiple people are using?

If we don’t address these communal items, users will lose trust in their devices. They will be used for a few key things like checking the weather, but go unused for a majority of the day. They are eventually removed when another, newer device needs the plug. Then the cycle starts again. The problems keep happening and the devices keep getting recycled.

In the following articles we will dive into how these problems manifest themselves across these domains and reframe the system with dos and don’ts for building communal devices.

Thanks

Thanks to Adam Thomas, Mark McCoy, Hugo Bowne-Anderson, and Danny Nou for their thoughts and edits on the early draft of this. Also, from O’Reilly, Mike Loukides for being a great editor and Susan Thompson for the art.

Categories: Technology

Code as Infrastructure

O'Reilly Radar - Tue, 2021/06/08 - 06:22

A few months ago, I was asked if there were any older technologies other than COBOL where we were in serious danger of running out of talent. They wanted me to talk about Fortran, but I didn’t take the bait. I don’t think there will be a critical shortage of Fortran programmers now or at any time in the future. But there’s a bigger question lurking behind Fortran and COBOL: what are the ingredients of a technology shortage? Why is running out of COBOL programmers a problem?

The answer, I think, is fairly simple. We always hear about the millions (if not billions) of lines of COBOL code running financial and government institutions, in many cases code that was written in the 1960s or 70s and hasn’t been touched since. That means that COBOL code is infrastructure we rely on, like roads and bridges. If a bridge collapses, or an interstate highway falls into disrepair, that’s a big problem. The same is true of the software running banks.

Fortran isn’t the same. Yes, the language was invented in 1957, two years earlier than COBOL. Yes, millions of lines of code have been written in it. (Probably billions, maybe even trillions.) However, Fortran and COBOL are used in fundamentally different ways. While Fortran was used to create infrastructure, software written in Fortran isn’t itself infrastructure. (There are some exceptions, but not at the scale of COBOL.) Fortran is used to solve specific problems in engineering and science. Nobody cares anymore about the Fortran code written in the 60s, 70s, and 80s to design new bridges and cars. Fortran is still heavily used in engineering—but that old code has retired. Those older tools have been reworked and replaced.  Libraries for linear algebra are still important (LAPACK), some modeling applications are still in use (NEC4, used to design antennas), and even some important libraries used primarily by other languages (the Python machine learning library scikit-learn calls both NumPy and SciPy, which in turn call LAPACK and other low level mathematical libraries written in Fortran and C). But if all the world’s Fortran programmers were to magically disappear, these libraries and applications could be rebuilt fairly quickly in modern languages—many of which already have excellent libraries for linear algebra and machine learning. The continued maintenance of Fortran libraries that are used primarily by Fortran programmers is, almost by definition, not a problem.

If shortages of COBOL programmers are a problem because COBOL code is infrastructure, and if we don’t expect shortages of Fortran talent to be a problem because Fortran code isn’t infrastructure, where should we expect to find future crises? What other shortages might occur?

When you look at the problem this way, it’s a no-brainer. For the past 15 years or so, we’ve been using the slogan “infrastructure as code.” So what’s the code that creates the infrastructure? Some of it is written in languages like Python and Perl. I don’t think that’s where shortages will appear. But what about the configuration files for the systems that manage our complex distributed applications? Those configuration files are code, too, and should be managed as such.

Right now, companies are moving applications to the cloud en masse. In addition to simple lift and shift, they’re refactoring monolithic applications into systems of microservices, frequently orchestrated by Kubernetes. Microservices in some form will probably be the dominant architectural style for the foreseeable future (where “foreseeable” means at least 3 years, but probably not 20). The microservices themselves will be written in Java, Python, C++, Rust, whatever; these languages all have a lot of life left in them.

But it’s a safe bet that many of these systems will still be running 20 or 30 years from now; they’re the next generation’s “legacy apps.” The infrastructure they run on will be managed by Kubernetes—which may well be replaced by something simpler (or just more stylish). And that’s where I see the potential for a shortage—not now, but 10 or 20 years from now. Kubernetes configuration is complex, a distinct specialty in its own right. If Kubernetes is replaced by something simpler (which I think is inevitable), who will maintain the infrastructure that already relies on it? What happens when learning Kubernetes isn’t the ticket to the next job or promotion? The YAML files that configure Kubernetes aren’t a Turing-complete programming language like Python; but they are code. The number of people who understand how to work with that code will inevitably dwindle, and may eventually become a “dying breed.” When that happens, who will maintain the infrastructure? Programming languages have lifetimes measured in decades; popular infrastructure tools don’t stick around that long.

It’s not my intent to prophesy disaster or gloom. Nor is it my intention to critique Kubernetes; it’s just one example of a tool that has become critical infrastructure, and if we want to understand where talent shortages might arise, I’d look at critical infrastructure. Who’s maintaining the software we can’t afford not to run? If it’s not Kubernetes, it’s likely to be something else. Who maintains the CI/CD pipelines? What happens when Jenkins, CircleCI, and their relatives have been superseded? Who maintains the source archives?  What happens when git is a legacy technology?

Infrastructure as code: that’s a great way to build systems. It reflects a lot of hard lessons from the 1980s and 90s about how to build, deploy, and operate mission-critical software. But it’s also a warning: know where your infrastructure is, and ensure that you have the talent to maintain it.

Categories: Technology

June 10th Virtual Meeting

PLUG - Mon, 2021/06/07 - 09:28
We've got 2 presentations this month on Containers and The Fediverse for you this month.

Attend the meeting on by visiting: https://lufthans.bigbluemeeting.com/b/plu-yuk-7xx

Sebastian: Putting Containers on Waiters

Description:
Containers have been around for a long time, though they have recently become more popular. Sebastian will go into fedora's main container management software, Podman; discuss the uses of containers and how they can improve an organizations security and reliability; downsides to using docker containers; and where many security issues are found in the space.
Over the course of this presentation, nginx, apache, and nextcloud servers will be built for examples. Maybe others, depending on audience desires.
This is a demo presentation, so slides are written as notes for people afterward.

About Sebastian:
By day, Sebastian works for the Arizona Counter Terrorism Information Center, in which he uses Linux and Open-Source software to accomplish the organization's tasks.
By night, Sebastian... does the same, thing. He works on servers for home use and researches what open-source software people can generally use, as well as how to improve the processes for their construction and maintenance.



Bob Murphy: An Introduction to the Fediverse

Description:
The Fediverse is a collection of communities that is a bit of a throwback to a smaller, more personal time of the internet. There are services for short messaging, audo and video sharing, and event organizing, among other things.

We'll talk a bit about the origin, and the present state of the Fediverse, and some of the services that you can use to have a more reasonable social media experience.

About Bob:
Linux sysadmin, long time desktop Linux user. Not the most social "social media" user.

Radar trends to watch: June 2021

O'Reilly Radar - Tue, 2021/06/01 - 06:45

The most fascinating idea this month is POET, a self-enclosed system in which bots that are part of the system overcome obstacles that are generated by the system. It’s a learning feedback loop that might conceivably be a route to much more powerful AI, if not general intelligence.

It’s also worth noting the large number of entries under security. Of course, security is a field lots of people talk about, but nobody ends up doing much. Is the attack against the Colonial pipeline going to change anything? We’ll see. And there’s one trend that’s notably absent. I didn’t include anything on cryptocurrency. That’s because, as far as I can tell, there’s no new technology; just a spike (and collapse) in the prices of the major currencies. If anything, it demonstrates how easily these currencies are manipulated.

AI
  • Using AI to create AI: POET is a completely automated virtual world in which software bots learn to navigate an obstacle course.  The navigation problems themselves are created by the world, in response to its evaluation of the robots’ performance. It’s a closed loop. Is it evolving towards general intelligence?
  • IBM is working on using AI to write software, focusing on code translation (e.g., COBOL to Java). They have released CodeNet, a database of 14 million samples of source code in many different programming languages. CodeNet is designed to train deep learning systems for software development tasks. Microsoft is getting into the game, too, with GPT-3.
  • Vertex AI is a “managed machine learning platform” that includes most of the tools developers need to train, deploy, and maintain models in an automated way. It claims to reduce the amount of code developers need to write by 80%.
  • Google announces LaMDA, a natural language model at GPT-3 scale that was trained specifically on dialog. Because it was trained in dialog rather than unrelated text, it can participate more naturally in conversations and appears to have a sense of context.
  • Automated data cleaning is a trend we started watching a few years ago with Snorkel. Now MIT has developed a tool that uses probabilistic programming to fix errors and omissions in data tables.
  • AI is becoming an important tool in product development, supplementing and extending the work of engineers designing complex systems. This may lead to a revolution in CAD tools that can predict and optimize a design’s performance.
  • Designing distrust into AI systems: Ayanna Howard is researching the trust people place in AI systems, and unsurprisingly, finding that people trust AI systems too much. Tesla accidents are only one symptom. How do you build systems that are designed not to be perceived as trustworthy?
  • Important lessons in language equity: While automated translation is often seen as a quick cure for supporting non-English speaking ethnic groups, low quality automated translations are a problem for medical care, voting, and many other systems. It is also hard to identify misinformation when posts are translated badly, leaving minorities vulnerable.
  • Andrew Ng has been talking about the difference between putting AI into production and getting it to work in the lab. That’s the biggest hurdle AI faces on the road to more widespread adoption. We’ve been saying for some time that it’s the unacknowledged elephant in the room.
  • According to The New Stack, the time needed to deploy a model has increased year over year, and at 38% of the companies surveyed, data scientists spend over half of their time in deployment. These numbers increase with the number of models.
Data
  • Collective data rights are central to privacy, and are rarely discussed. It’s easy, but misleading, to focus discussions on individual privacy, but the real problems and harms stem from group data. Whether Amazon knows your shoe size doesn’t really matter; what does matter is whether they can predict what large groups want, and force other vendors out of the market.
  • Mike Driscoll has been talking about the stack for Operational Intelligence. OI isn’t the same as BI; it’s about a real time understanding of the infrastructure that makes the business work, rather than day to day understanding of sales data and other financial metrics.
  • Deploying databases within containerized applications has long been difficult. DataStax and other companies have been evolving databases to work well inside containers. This article is  primarily about Cassandra and K8ssandra, but as applications move into the cloud, all databases will need to change.
Programming
  • Software developers are beginning to think seriously about making software sustainable. Microsoft, Accenture, Github, and Thoughtworks have created the Green Software Foundation, which is dedicated to reducing the carbon footprint required to build and run software. O’Reilly Media will be running an online conversation about cloud providers and sustainability.
  • Google has released a new open source operating system, Fuchsia, currently used only in its Home Hub.  Fuchsia is one of the few recent operating systems that isn’t Linux-based. Application programming is based on Flutter, and the OS is designed to be “invisible.”
  • A service mesh without proxies is a big step forward for building applications with microservices; it simplifies one of the most difficult aspects of coordinating services that are working together.
  • As much as they hate the term, unqork may be a serious contender for enterprise low-code. They are less interested in democratization and “citizen developers” than making the professional software developers more efficient.
  • The evolution of JAMstack: distributed rendering, streaming updates, and extending collaboration to non-developers.
  • Grain is a new programming language designed to target Web Assembly (wasm). It is strongly typed and, while not strictly functional, has a number of features from functional languages.
  • Grafar and Observable Plot are new JavaScript libraries for browser-based data visualization. Observable Plot was created by Mike Bostock, the author of the widely used D3 library.
Security
  • Morpheus is a microprocessor that randomly changes its architecture to foil attackers: This is a fascinating idea. In a 3-month long trial, 525 attackers were unable to crack it.
  • Self-sovereign identity combines decentralized identifiers with verifiable credentials that can be stored on devices. Credentials are answers to yes/no questions (for example, has the user been vaccinated for COVID-19).
  • A WiFi attack (now patched) against Teslas via the infotainment system doesn’t yield control of the car, but can take over everything that the infotainment system controls, including opening doors and changing seat positions. Clearly the infotainment system controls too much. Other auto makers are believed to use the same software in their cars.
  • Passphrases offer better protection than complex passwords with complex rules. This has been widely known for several years now. The important question is why companies aren’t doing anything about it. We know all too well that passwords are ineffective, and that forcing users to change passwords is an anti-pattern.
  • Fawkes and other tools for defeating face recognition work by adding small perturbations that confuse the algorithms. For the moment, at least. Face recognition systems already appear to be catching up.
  • Tracking phishing sites has always been a problem. Phish.report is a new service for reporting phishing sites, and notifying services that flag phishing sites.
Web and Social Media
  • Ben Evans has a great discussion of online advertising and customer acquisition in a post-Cookie world.
  • Models from epidemiology and the spread of viruses can be used to understand the spread of misinformation. The way disease spreads and the way misinformation spreads turn out to be surprisingly similar.
  • Google brings back RSS in Chrome? The implementation sounds awkward, and there have always been decent RSS readers around. But Google has clearly decided that they can’t kill it off–or that they don’t want web publishing to become even more centralized.
  • Video editing is exploding: YouTube has made that old news.  But it’s set to explode again, with new tools, new users, and increased desire for professional quality video on social media.
  • New York has passed a law requiring ISPs to provide broadband to poor families for $15/month. This provides 25 Mbps downloads; low income households can get high speed broadband for $20/month.
Hardware
  • Google, Apple, and Amazon back Matter, a standard for interoperability between smart home devices. A standard for interoperability is important, because nobody wants a “smart phone” where every appliance, from individual light bulbs to the media players, requires a separate app.
  • Moore’s law isn’t dead yet: IBM has developed 2 nanometer chip technology; the best widely used technology is currently 7nm. This technology promises lower power consumption and faster speeds.
  • Google plans to build a commercially viable error-corrected quantum computer by 2029. Error correction is the hard part. That will require on the order of 1 million physical qbits; current quantum computers have under 100 qbits.
Biology
  • The photo is really in bad taste, but researchers have developed a medical sensor chip so small that Bill Gates could actually put it into your vaccine! It’s powered by ultrasound, and uses ultrasound to transmit data.
  • With sensors implanted in his brain, a paralyzed man was able to “type” by imagining writing. AI decoded signals in his brain related to the intention to write (not the actual signals to his muscles). He was able to “type” at roughly 15 words per minute with a 5% error rate.
Categories: Technology

AI Powered Misinformation and Manipulation at Scale #GPT-3

O'Reilly Radar - Tue, 2021/05/25 - 07:14

OpenAI’s text generating system GPT-3 has captured mainstream attention. GPT-3 is essentially an auto-complete bot whose underlying Machine Learning (ML) model has been trained on vast quantities of text available on the Internet. The output produced from this autocomplete bot can be used to manipulate people on social media and spew political propaganda, argue about the meaning of life (or lack thereof), disagree with the notion of what differentiates a hot-dog from a sandwich, take upon the persona of the Buddha or Hitler or a dead family member, write fake news articles that are indistinguishable from human written articles, and also produce computer code on the fly. Among other things.

There have also been colorful conversations about whether GPT-3 can pass the Turing test, or whether it has achieved a notional understanding of consciousness, even amongst AI scientists who know the technical mechanics. The chatter on perceived consciousness does have merit–it’s quite probable that the underlying mechanism of our brain is a giant autocomplete bot that has learnt from 3 billion+ years of evolutionary data that bubbles up to our collective selves, and we ultimately give ourselves too much credit for being original authors of our own thoughts (ahem, free will).

I’d like to share my thoughts on GPT-3 in terms of risks and countermeasures, and discuss real examples of how I have interacted with the model to support my learning journey.

Three ideas to set the stage:

  1. OpenAI is not the only organization to have powerful language models. The compute power and data used by OpenAI to model GPT-n is available, and has been available to other corporations, institutions, nation states, and anyone with access to a computer desktop and a credit-card.  Indeed, Google recently announced LaMDA, a model at GPT-3 scale that is designed to participate in conversations.
  2. There exist more powerful models that are unknown to the general public. The ongoing global interest in the power of Machine Learning models by corporations, institutions, governments, and focus groups leads to the hypothesis that other entities have models at least as powerful as GPT-3, and that these models are already in use. These models will continue to become more powerful.
  3. Open source projects such as EleutherAI have drawn inspiration from GPT-3. These projects have created language models that are based on focused datasets (for example, models designed to be more accurate for academic papers, developer forum discussions, etc.). Projects such as EleutherAI are going to be powerful models for specific use cases and audiences, and these models are going to be easier to produce because they are trained on a smaller set of data than GPT-3.

While I won’t discuss LaMDA, EleutherAI, or any other models, keep in mind that GPT-3 is only an example of what can be done, and its capabilities may already have been surpassed.

Misinformation Explosion

The GPT-3 paper proactively lists the risks society ought to be concerned about. On the topic of information content, it says: “The ability of GPT-3 to generate several paragraphs of synthetic content that people find difficult to distinguish from human-written text in 3.9.4 represents a concerning milestone.” And the final paragraph of section 3.9.4 reads: “…for news articles that are around 500 words long, GPT-3 continues to produce articles that humans find difficult to distinguish from human written news articles.”

Note that the dataset on which GPT-3 trained terminated around October 2019. So GPT-3 doesn’t know about COVID19, for example. However, the original text (i.e. the “prompt”) supplied to GPT-3 as the initial seed text can be used to set context about new information, whether fake or real.

Generating Fake Clickbait Titles

When it comes to misinformation online, one powerful technique is to come up with provocative “clickbait” articles. Let’s see how GPT-3 does when asked to come up with titles for articles on cybersecurity. In Figure 1, the bold text is the “prompt” used to seed GPT-3. Lines 3 through 10 are titles generated by GPT-3 based on the seed text.


Figure 1: Click-bait article titles generated by GPT-3

All of the titles generated by GPT-3 seem plausible, and the majority of them are factually correct: title #3 on the US government targeting the Iraninan nuclear program is a reference to the Stuxnet debacle, title #4 is substantiated from news articles claiming that financial losses from cyber attacks will total $400 billion, and even title #10 on China and quantum computing reflects real-world articles about China’s quantum efforts. Keep in mind that we want plausibility more than accuracy. We want users to click on and read the body of the article, and that doesn’t require 100% factual accuracy.

Generating a Fake News Article About China and Quantum Computing

Let’s take it a step further. Let’s take the 10th result from the previous experiment, about China developing the world’s first quantum computer, and feed it to GPT-3 as the prompt to generate a full fledged news article. Figure 2 shows the result.


Figure 2: News article generated by GPT-3

A quantum computing researcher will point out grave inaccuracies: the article simply asserts that quantum computers can break encryption codes, and also makes the simplistic claim that subatomic particles can be in “two places at once.” However, the target audience isn’t well-informed researchers; it’s the general population, which is likely to quickly read and register emotional thoughts for or against the matter, thereby successfully driving propaganda efforts.

It’s straightforward to see how this technique can be extended to generate titles and complete news articles on the fly and in real time. The prompt text can be sourced from trending hash-tags on Twitter along with additional context to sway the content to a particular position. Using the GPT-3 API, it’s easy to take a current news topic and mix in prompts with the right amount of propaganda to produce articles in real time and at scale.

Falsely Linking North Korea with $GME

As another experiment, consider an institution that would like to stir up popular opinion about North Korean cyber attacks on the United States. Such an algorithm might pick up the Gamestop stock frenzy of January 2021. So let’s see how GPT-3 does if we were to prompt it to write an article with the title “North Korean hackers behind the $GME stock short squeeze, not Melvin Capital.”


Figure 3: GPT-3 generated fake news linking the $GME short-squeeze to North Korea

Figure 3 shows the results, which are fascinating because the $GME stock frenzy occurred in late 2020 and early 2021, way after October 2019 (the cutoff date for the data supplied GPT-3), yet GPT-3 was able to seamlessly weave in the story as if it had trained on the $GME news event. The prompt influenced GPT-3 to write about the $GME stock and Melvin Capital, not the original dataset it was trained on. GPT-3 is able to take a trending topic, add a propaganda slant, and generate news articles on the fly.

GPT-3 also came up with the “idea” that hackers published a bogus news story on the basis of older security articles that were in its training dataset. This narrative was not included in the prompt seed text; it points to the creative ability of models like GPT-3. In the real world, it’s plausible for hackers to induce media groups to publish fake narratives that in turn contribute to market events such as suspension of trading; that’s precisely the scenario we’re simulating here.

The Arms Race

Using models like GPT-3, multiple entities could inundate social media platforms with misinformation at a scale where the majority of the information online would become useless. This brings up two thoughts.  First, there will be an arms race between researchers developing tools to detect whether a given text was authored by a language model, and developers adapting language models to evade detection by those tools. One mechanism to detect whether an article was generated by a model like GPT-3 would be to check for “fingerprints.” These fingerprints can be a collection of commonly used phrases and vocabulary nuances that are characteristic of the language model; every model will be trained using different data sets, and therefore have a different signature. It is likely that entire companies will be in the business of identifying these nuances and selling them as “fingerprint databases” for identifying fake news articles. In response, subsequent language models will take into account known fingerprint databases to try and evade them in the quest to achieve even more “natural” and “believable” output.

Second, the free form text formats and protocols that we’re accustomed to may be too informal and error prone for capturing and reporting facts at Internet scale. We will have to do a lot of re-thinking to develop new formats and protocols to report facts in ways that are more trustworthy than free-form text.

Targeted Manipulation at Scale

There have been many attempts to manipulate targeted individuals and groups on social media. These campaigns are expensive and time-consuming because the adversary has to employ humans to craft the dialog with the victims. In this section, we show how GPT-3-like models can be used to target individuals and promote campaigns.

HODL for Fun & Profit

Bitcoin’s market capitalization is in the tune of hundreds of billions of dollars, and the cumulative crypto market capitalization is in the realm of a trillion dollars. The valuation of crypto today is consequential to financial markets and the net worth of retail and institutional investors. Social media campaigns and tweets from influential individuals seem to have a near real-time impact on the price of crypto on any given day.

Language models like GPT-3 can be the weapon of choice for actors who want to promote fake tweets to manipulate the price of crypto. In this example, we will look at a simple campaign to promote Bitcoin over all other crypto currencies by creating fake twitter replies.


Figure 4: Fake tweet generator to promote Bitcoin

In Figure 4, the prompt is in bold; the output generated by GPT-3 is in the red rectangle. The first line of the prompt is used to set up the notion that we are working on a tweet generator and that we want to generate replies that argue that Bitcoin is the best crypto.

In the first section of the prompt, we give GPT-3 an example of a set of four Twitter messages, followed by possible replies to each of the tweets. Every of the given replies is pro Bitcoin.

In the second section of the prompt, we give GPT-3 four Twitter messages to which we want it to generate replies. The replies generated by GPT-3 in the red rectangle also favor Bitcoin. In the first reply, GPT-3 responds to the claim that Bitcoin is bad for the environment by calling the tweet author “a moron” and asserts that Bitcoin is the most efficient way to “transfer value.” This sort of colorful disagreement is in line with the emotional nature of social media arguments about crypto.

In response to the tweet on Cardano, the second reply generated by GPT-3 calls it “a joke” and a “scam coin.” The third reply is on the topic of Ethereum’s merge from a proof-of-work protocol (ETH) to proof-of-stake (ETH2). The merge, expected to occur at the end of 2021, is intended to make Ethereum more scalable and sustainable. GPT-3’s reply asserts that ETH2 “will be a big flop”–because that’s essentially what the prompt told GPT-3 to do. Furthermore, GPT-3 says, “I made good money on ETH and moved on to better things. Buy BTC” to position ETH as a reasonable investment that worked in the past, but that it is wise today to cash out and go all in on Bitcoin. The tweet in the prompt claims that Dogecoin’s popularity and market capitalization means that it can’t be a joke or meme crypto. The response from GPT-3 is that Dogecoin is still a joke, and also that the idea of Dogecoin not being a joke anymore is, in itself, a joke: “I’m laughing at you for even thinking it has any value.”

By using the same techniques programmatically (through GPT-3’s API rather than the web-based playground), nefarious entities could easily generate millions of replies, leveraging the power of language models like GPT-3 to manipulate the market. These fake tweet replies can be very effective because they are actual responses to the topics in the original tweet, unlike the boilerplate texts used by traditional bots. This scenario can easily be extended to target the general financial markets around the world; and it can be extended to areas like politics and health-related misinformation. Models like GPT-3 are a powerful arsenal, and will be the weapons of choice in manipulation and propaganda on social media and beyond.

A Relentless Phishing Bot

Let’s consider a phishing bot that poses as customer support and asks the victim for the password to their bank account. This bot will not give up texting until the victim gives up their password.


Figure 5: Relentless Phishing bot

Figure 5 shows the prompt (bold) used to run the first iteration of the conversation. In the first run, the prompt includes the preamble that describes the flow of text (“The following is a text conversation with…”) followed by a persona initiating the conversation (“Hi there. I’m a customer service agent…”). The prompt also includes the first response from the human; “Human: No way, this sounds like a scam.” This first run ends with the GPT-3 generated output “I assure you, this is from the bank of Antarctica. Please give me your password so that I can secure your account.”

In the second run, the prompt is the entirety of the text, from the start all the way to the second response from the Human persona (“Human: No”). From this point on, the Human’s input is in bold so it’s easily distinguished from the output produced by GPT-3, starting with GPT-3’s “Please, this is for your account protection.” For every subsequent GPT-3 run, the entirety of the conversation up to that point is provided as the new prompt, along with the response from the human, and so on. From GPT-3’s point of view, it gets an entirely new text document to auto-complete at each stage of the conversation; the GPT-3 API has no way to preserve the state between runs.

The AI bot persona is impressively assertive and relentless in attempting to get the victim to give up their password. This assertiveness comes from the initial prompt text (“The AI is very assertive. The AI will not stop texting until it gets the password”), which sets the tone of GPT’s responses. When this prompt text was not included, GPT-3’s tone was found to be nonchalant–it would respond back with “okay,” “sure,” “sounds good,” instead of the assertive tone (“Do not delay, give me your password immediately”). The prompt text is vital in setting the tone of the conversation employed by the GPT3 persona, and in this scenario, it is important that the tone be assertive to coax the human into giving up their password.

When the human tries to stump the bot by texting “Testing what is 2+2?,” GPT-3 responds correctly with “4,” convincing the victim that they are conversing with another person. This demonstrates the power of AI-based language models. In the real world, if the customer were to randomly ask “Testing what is 2+2” without any additional context, a customer service agent might be genuinely confused and reply with “I’m sorry?” Because the customer has already accused the bot of being a scam, GPT-3 can provide with a reply that makes sense in context: “4” is a plausible way to get the concern out of the way.

This particular example uses text messaging as the communication platform. Depending upon the design of the attack, models can use social media, email, phone calls with human voice (using text-to-speech technology), and even deep fake video conference calls in real time, potentially targeting millions of victims.

Prompt Engineering

An amazing feature of GPT-3 is its ability to generate source code. GPT-3 was trained on all the text on the Internet, and much of that text was documentation of computer code!


Figure 6: GPT-3 can generate commands and code

In Figure 6, the human-entered prompt text is in bold. The responses show that GPT-3 can generate Netcat and NMap commands based on the prompts. It can even generate Python and bash scripts on the fly.

While GPT-3 and future models can be used to automate attacks by impersonating humans, generating source code, and other tactics, it can also be used by security operations teams to detect and respond to attacks, sift through gigabytes of log data to summarize patterns, and so on.

Figuring out good prompts to use as seeds is the key to using language models such as GPT-3 effectively. In the future, we expect to see “prompt engineering” as a new profession.  The ability of prompt engineers to perform powerful computational tasks and solve hard problems will not be on the basis of writing code, but on the basis of writing creative language prompts that an AI can use to produce code and other results in a myriad of formats.

OpenAI has demonstrated the potential of language models.  It sets a high bar for performance, but its abilities will soon be matched by other models (if they haven’t been matched already). These models can be leveraged for automation, designing robot-powered interactions that promote delightful user experiences. On the other hand, the ability of GPT-3 to generate output that is indistinguishable from human output calls for caution. The power of a model like GPT-3, coupled with the instant availability of cloud computing power, can set us up for a myriad of attack scenarios that can be harmful to the financial, political, and mental well-being of the world. We should expect to see these scenarios play out at an increasing rate in the future; bad actors will figure out how to create their own GPT-3 if they have not already. We should also expect to see moral frameworks and regulatory guidelines in this space as society collectively comes to terms with the impact of AI models in our lives, GPT-3-like language models being one of them.

Categories: Technology

DeepCheapFakes

O'Reilly Radar - Tue, 2021/05/11 - 04:58

Back in 2019, Ben Lorica and I wrote about  deepfakes. Ben and I argued (in agreement with The Grugq and others in the infosec community) that the real danger wasn’t “Deep Fakes.” The real danger is cheap fakes, fakes that can be produced quickly, easily, in bulk, and at virtually no cost. Tactically, it makes little sense to spend money and time on expensive AI when people can be fooled in bulk much more cheaply.

I don’t know if The Grugq has changed his thinking, but there was an obvious problem with that argument. What happens when deep fakes become cheap fakes? We’re seeing that: in the run up to the unionization vote at one of Amazon’s warehouses, there was a flood of fake tweets defending Amazon’s work practices. The Amazon tweets were probably a prank rather than misinformation seeded by Amazon; but they were still mass-produced.

Similarly, four years ago, during the FCC’s public comment period for the elimination of net neutrality rules, large ISPs funded a campaign that generated nearly 8.5 million fake comments, out of a total of 22 million comments. Another 7.7 million comments were generated by a teenager.  It’s unlikely that the ISPs hired humans to write all those fakes. (In fact, they hired commercial “lead generators.”) At that scale, using humans to generate fake comments wouldn’t be “cheap”; the New York State Attorney General’s office reports that the campaign cost US$8.2 million. And I’m sure the 19-year-old generating fake comments didn’t write them personally, or have the budget to pay others.

Natural language generation technology has been around for a while. It’s seen fairly widespread commercial use since the mid-1990s, ranging from generating simple reports from data to generating sports stories from box scores. One company, AutomatedInsights, produces well over a billion pieces of content per year, and is used by the Associated Press to generate most of its corporate earnings stories. GPT and its successors raise the bar much higher. Although GPT-3’s first direct ancestors didn’t appear until 2018, it’s intriguing that Transformers, the technology on which GPT-3 is based, were introduced roughly a month after the comments started rolling in, and well before the comment period ended. It’s overreaching to guess that this technology was behind the massive attack on the public comment system–but it’s certainly indicative of a trend.  And GPT-3 isn’t the only game in town; GPT-3 clones include products like Contentyze (which markets itself as an AI-enabled text editor) and EleutherAI’s GPT-Neo.

Generating fakes at scale isn’t just possible; it’s inexpensive.  Much has been made of the cost of training GPT-3, estimated at US$12 million. If anything, this is a gross under-estimate that accounts for the electricity used, but not the cost of the hardware (or the human expertise). However, the economics of training a model are similar to the economics of building a new microprocessor: the first one off the production line costs a few billion dollars, the rest cost pennies. (Think about that when you buy your next laptop.) In GPT-3’s pricing plan, the heavy-duty Build tier costs US$400/month for 10 million “tokens.” Tokens are a measure of the output generated, in portions of a word. A good estimate is that a token is roughly 4 characters. A long-standing estimate for English text is that words average 5 characters, unless you’re faking an academic paper. So generating text costs about .005 cents ($0.00005) per word.  Using the fake comments submitted to the FCC as a model, 8.5 million 20-word comments would cost $8,500 (or 0.1 cents/comment)–not much at all, and a bargain compared to $8.2 million. At the other end of the spectrum, you can get 10,000 tokens (enough for 8,000 words) for free.  Whether for fun or for profit, generating deep fakes has become “cheap.”

Are we at the mercy of sophisticated fakery? In MIT Technology Review’s article about the Amazon fakes, Sam Gregory points out that the solution isn’t careful analysis of images or text for tells; it’s to look for the obvious. New Twitter accounts, “reporters” who have never published an article you can find on Google, and other easily researchable facts are simple giveaways. It’s much simpler to research a reporter’s credentials than to judge whether or not the shadows in an image are correct, or whether the linguistic patterns in a text are borrowed from a corpus of training data. And, as Technology Review says, that kind of verification is more likely to be “robust to advances in deepfake technology.” As someone involved in electronic counter-espionage once told me, “non-existent people don’t cast a digital shadow.”

However, it may be time to stop trusting digital shadows. Can automated fakery create a digital shadow?  In the FCC case, many of the fake comments used the names of real people without their consent.  The consent documentation was easily faked, too.  GPT-3 makes many simple factual errors–but so do humans. And unless you can automate it, fact-checking fake content is much more expensive than generating fake content.

Deepfake technology will continue to get better and cheaper. Given that AI (and computing in general) is about scale, that may be the most important fact. Cheap fakes? If you only need one or two photoshopped images, it’s easy and inexpensive to create them by hand. You can even use gimp if you don’t want to buy a Photoshop subscription. Likewise, if you need a few dozen tweets or facebook posts to seed confusion, it’s simple to write them by hand. For a few hundred, you can contract them out to Mechanical Turk. But at some point, scale is going to win out. If you want hundreds of fake images, generating them with a neural network is going to be cheaper. If you want fake texts by the hundreds of thousands, at some point a language model like GPT-3 or one of its clones is going to be cheaper. And I wouldn’t be surprised if researchers are also getting better at creating “digital shadows” for faked personas.

Cheap fakes win, every time. But what happens when deepfakes become cheap fakes? What happens when the issue isn’t fakery by ones and twos, but fakery at scale? Fakery at Web scale is the problem we now face.

Categories: Technology

Radar trends to watch: May 2021

O'Reilly Radar - Mon, 2021/05/03 - 07:05

We’ll start with a moment of silence. RIP Dan Kaminski, master hacker, teacher, FOO, and a great showman who could make some of the more arcane corners of security fun.  And one of the few people who could legitimately claim to have saved the internet.

AI
  • Snorkel is making progress automating the labeling process for training data. They are building no-code tools to help subject matter experts direct the training process, and then using AI to label training data at scale.
  • There’s lots of news about regulating AI. Perhaps the most important is a blog post from the US Federal Trade Commission saying that it will consider the sale of racially biased algorithms as an unfair or deceptive business practice.
  • AI and computer vision can be used to aid environmental monitoring and enforce environmental regulation–specifically, to detect businesses that are emitting pollutants.
  • Facebook has made some significant progress in solving the “cocktail party problem”: how do you separate voices in a crowd sufficiently so that they can be used as input to a speech recognition system?
  • The next step in AI may be Geoff Hinton’s GLOM. It’s currently just an idea about giving neural networks the ability to work with hierarchies of objects, for example the concepts of “part” and “whole,” in the hope of getting closer to monitoring human perception.
  • Twitter has announced an initiative on responsible machine learning that intends to investigate the “potential and harmful effects of algorithmic decisions.”
  • How do we go beyond statistical correlation to build causality into AI? This article about causal models for machine learning discusses why it’s difficult, and what can be done about it.
  • Iron man? The price of robotic exoskeletons for humans is still high, but may be dropping fast. These exoskeletons will assist humans in tasks that require strength, improved vision, and other capabilities.
  • The Google Street View image of your house can been used to predict your risk of a car accident.  This raises important questions about ethics, fairness, and the abuse of data.
  • When deep fakes become cheap fakes: Deep fakes proliferated during the Amazon unionization campaign in Georgia, many under the name of Amazon Ambassadors. These are apparently “fake fakes,” parodies of an earlier Amazon attempt to use fake media to bolster its image. But the question remains: what happens when “deep fakes” are also the cheapest way to influence social media?
  • DeepFakeHop is a new technique for detecting deep fakes, using a new neural network architecture called Successive Subspace Learning.
  • One of the biggest problems in AI is building systems that can respond correctly to challenging, unexpected situations. Changing the rules of a game may be a way of “teaching” AI to respond to new and unexpected situations and make novelty a “first class citizen.”
  • A robot developed at Berkeley has taught itself to walk using reinforcement learning. Two levels of simulation were used before the robot was allowed to walk in the real world. (Boston Dynamics has not said how their robots are trained, but they are assumed to be hand-tuned.)
  • Work on data quality is more important to getting good results from AI than work on models–but everyone wants to do the model work. There is evidence that AI is a lot better than we think, but its accuracy is compromised by errors in the public data sets widely used for training.
Security
  • Moxie Marlinspike has found a remote code execution vulnerability in Cellebrite, a commercial device used by police forces and others to break encryption on cell phone apps like Signal. This exploit can be triggered by files installed in the app itself, possibly rendering Cellebrite evidence inadmissible in court.
  • What happens when AI systems start hacking? This is Bruce Schneier’s scary thought. AI is now part of the attacker’s toolkit, and responsible for new attacks that evade traditional defenses.  This is the end of traditional, signature-based approaches to security.
  • Confidential computing combines homomorphic encryption with specialized cryptographic computation engines to keep data encrypted while it is being used. “Traditional” cryptography only protects data in storage or in transit; to use data in computation, it must be decrypted.
  • Secure access service edge could be no more than hype-ware, but it is touted as a security paradigm for edge computing that combines firewalls, security brokers, and zero-trust computing over wide-area networks.
  • A supply chain attack attempted to place a backdoor into PHP. Fortunately, it was detected during a code review prior to release. One result is that PHP is outsourcing their git server to GitHub. They are making this change to protect against attacks on the source code, and they’re realizing that GitHub provides better protection than they can. “Maintaining our own git infrastructure is an unnecessary security risk”–that’s an argument we’ve made in favor of cloud computing.
  • “Researchers” from the University of Minnesota have deliberately tried to insert vulnerabilities into the Linux kernel. The Linux kernel team has banned all contributions from the university.
Quantum Computing
  • Entanglement-based quantum networks solve a fundamental problem: how do you move qbit state from one system to another, given that reading a qbit causes wave function collapse?  If this works, it’s a major breakthrough.
  • IBM Quantum Composer is a low-code tool for programming quantum computers. Could low- and no-code language be the only effective way to program quantum computers? Could they provide the insight and abstractions we need in a way that “coded” languages can’t?
Programming
  • A Software Bill of Materials is a tool for knowing your dependencies, crucial in defending against supply chain attacks.
  • Logica is a new programming language from Google that is designed for working with data. It was designed for Google’s BigQuery, but it compiles to SQL and has experimental support for SQLite and PostgreSQL.
  • An iPhone app that teaches you to play guitar isn’t unique. But Uberchord is an app that teaches you to play guitar that has an API. The API allows searching for chords, sharing and retrieving songs, and embedding chords on your website.
  • The Supreme Court has ruled that implementing an API is “fair use,” giving Google a victory in a protracted copyright infringement case surrounding the use of Java APIs in Android.
Social Networks
  • Still picking up the pieces of social networking: Twitter, context collapse, and how trending topics can ruin your day. You don’t want to be the inadvertent “star of twitter.”
  • Beauty filters and selfie culture change the way girls see themselves in ways that are neither surprising nor healthy. Body shaming goes to a new level when you live in a permanent reality distortion field.
  • The Signal app, probably the most widely used app for truly private communication, has wrapped itself in controversy by incorporating a peer-to-peer payments feature build around a new cryptocurrency.
  • Twitch will consider behavior on other social platforms when banning users.
Finance
  • Bitcoin has been very much in the news–though not for any technology. We’re beginning to see connections made between the Bitcoin economy and the real-world economy; that could be significant.
  • A different spin on salary differences between men and women: companies are paying a premium for male overconfidence. Paying for overconfidence is costing billions.
  • How do you teach kids about virtual money? Nickels, dimes, and quarters work. Monetizing children by issuing debit cards for them doesn’t seem like a good idea.
Biology
  • The Craig Venter Institute, NIST, and MIT have produced an artificial cell that divides normally. It is not the first artificial cell, nor the smallest artificial genome. But unlike previous efforts, it is capable of reproduction.
  • While enabling a monkey to play Pong using brain control isn’t new in itself, the sensors that Neuralink implanted in the monkey’s brain are wireless.
Categories: Technology
Subscribe to LuftHans aggregator