You are here

Feed aggregator

PLUG Security meeting on 7/18

PLUG - Thu, 2019/07/11 - 20:00
At this month's PLUG Security meeting:
Donald McCarthy: passiveDNS For fun and Profit (part1)

For more information:

If you DNS infrastructure has a bad day, your network has a bad day. If your DNS infrastructure has a good day, something else is bound to go wrong. PassiveDNS generally wont help you fix either.

PassiveDNS is a historical look at observed DNS queries over time. It is akin to The Internet Archive's Way Back Machine, but for DNS zones. Its utility as an operations and security tool is valuable and not easily replaced by another type of data.

In this presentation we will cover exactly what passiveDNS is and isn't, passiveDNS architecture, some security use cases, and if time allows some live demonstration.

In part 2 of the presentation (another month) I will demonstrate some passiveDNS tooling and more in depth practical knowledge to turn theoretical use cases into automated assistance for a SOC or NOC.

About Donald:
Donald "Mac" McCarthy is a 15 year veteran of the IT industry with the last 8 years focused on InfoSec. He has worked on a variety of different systems ranging from cash registers to super computers. It was while serving as a systems administrator for a scientific computing cluster that he discovered his passion for using linux for highly distributed complex tasks. His current focus is using linux with open source technologies like kafka and elastic search to build tooling for security analysts and network operations. He is a proud Veteran of the United States Army and recently relocated from Atlanta to the East Valley.

PLUG meeting on Jul 11th

PLUG - Mon, 2019/07/08 - 23:01
We'll have 2 presenters this month with a distribution theme.

Artemii Kropachev: Red Hat Enterprise Linux 8 Beta 1 Overview

Learn about the first version release of Red Hat Enterprise Linux in over four years. The latest release features unprecedented ease of deployment, ease of migration, and ease of management enabling you to upgrade existing customers and attract new ones.
Red Hat Enterprise Linux 8 gives organizations a stable, security-focused, and consistent foundation across hybrid cloud deployments—and the tools they need to deliver applications and workloads faster with less effort.

About Artemii:
Worldwide IT expert and international consultant with over 20 years of high level IT experience and expertise. I have trained, guided and consulted hundreds of architects, engineer, developers, and IT experts around the world since 2001. My architect-level experience covers DC, Clouds, DevOps, NFV solutions built on top of any Red Hat and Open Source technologies. I am one of the highest Red Hat Certified Specialists in the world.

der.hans: Hey Buster! Debian 10 released

Debian 10 brings with it many ch-ch-changes.

Reproduciable Builds, Wayland, AppArmor, nftables, cups.

10 hardware architectures, 59000 packages, 28,939 source packages, 11,610,055 source files, and 76 languages.

Stretch updates.

Get or upgrade to Debian 10 now.

Coming soon on Blu-ray.

About der.hans:
der.hans is a Free Software, technology and entrepreneurial veteran. He is a repeat author for the Linux Journal with his article about online privacy and security using a password manager as the cover article for the January 2017 issue.

He's chairman of the Phoenix Linux User Group (PLUG), BoF organizer for the Southern California Linux Expo (SCaLE), and founder of the Free Software Stammtisch and Stammtisch Job Nights.

He often presents at large community-led conferences (SCaLE, SeaGL, LFNW, Tübix) and many local groups.

Highlights from the O'Reilly Artificial Intelligence Conference in Beijing 2019

O'Reilly Radar - Mon, 2019/07/08 - 08:51

Experts explore the future of hiring, AI breakthroughs, embedded machine learning, and more.

Experts from across the AI world came together for the O'Reilly Artificial Intelligence Conference in Beijing. Below you'll find links to highlights from the event.

The future of hiring and the talent market with AI

Maria Zheng examines AI and its impact on people’s jobs, quality of work, and overall business outcomes.

The future of machine learning is tiny

Pete Warden digs into why embedded machine learning is so important, how to implement it on existing chips, and some of the new use cases it will unlock.

AI and systems at RISELab

Ion Stoica outlines a few projects at the intersection of AI and systems that UC Berkeley's RISELab is developing.

Top AI breakthroughs you need to know

Abigail Hing Wen discusses some of the most exciting recent breakthroughs in AI and robotics.

Data orchestration for AI, big data, and cloud

Haoyuan Li offers an overview of a data orchestration layer that provides a unified data access and caching layer for single cloud, hybrid, and multicloud deployments.

AI and retail

Mikio Braun takes a look at Zalando and the retail industry to explore how AI is redefining the way ecommerce sites interact with customers.

Why do we say AI should be cloud native?

Yangqing Jia reviews industry trends supporting the argument that AI should be cloud native.

--> Designing computer hardware for artificial intelligence

Michael James examines the fundamental drivers of computer technology and surveys the landscape of AI hardware solutions.

Toward learned algorithms, data structures, and systems

Tim Kraska outlines ways to build learned algorithms and data structures to achieve “instance optimality” and unprecedented performance for a wide range of applications.

Continue reading Highlights from the O'Reilly Artificial Intelligence Conference in Beijing 2019.

Categories: Technology

The future of hiring and the talent market with AI

O'Reilly Radar - Mon, 2019/07/08 - 08:50

Maria Zheng examines AI and its impact on people’s jobs, quality of work, and overall business outcomes.

Continue reading The future of hiring and the talent market with AI.

Categories: Technology

Top AI breakthroughs you need to know

O'Reilly Radar - Mon, 2019/07/08 - 08:50

Abigail Hing Wen discusses some of the most exciting recent breakthroughs in AI and robotics.

Continue reading Top AI breakthroughs you need to know.

Categories: Technology

Data orchestration for AI, big data, and cloud

O'Reilly Radar - Mon, 2019/07/08 - 08:50

Haoyuan Li offers an overview of a data orchestration layer that provides a unified data access and caching layer for single cloud, hybrid, and multicloud deployments.

Continue reading Data orchestration for AI, big data, and cloud.

Categories: Technology

Toward learned algorithms, data structures, and systems

O'Reilly Radar - Mon, 2019/07/08 - 08:50

Tim Kraska outlines ways to build learned algorithms and data structures to achieve “instance optimality” and unprecedented performance for a wide range of applications.

Continue reading Toward learned algorithms, data structures, and systems.

Categories: Technology

The future of machine learning is tiny

O'Reilly Radar - Mon, 2019/07/08 - 08:50

Pete Warden digs into why embedded machine learning is so important, how to implement it on existing chips, and shares new use cases it will unlock.

Continue reading The future of machine learning is tiny.

Categories: Technology

AI and systems at RISELab

O'Reilly Radar - Mon, 2019/07/08 - 08:50

Ion Stoica outlines a few projects at the intersection of AI and systems that UC Berkeley's RISELab is developing.

Continue reading AI and systems at RISELab.

Categories: Technology

Designing computer hardware for artificial intelligence

O'Reilly Radar - Mon, 2019/07/08 - 08:50

Michael James examines the fundamental drivers of computer technology and surveys the landscape of AI hardware solutions.

Continue reading Designing computer hardware for artificial intelligence.

Categories: Technology

Four short links: 8 July 2019

O'Reilly Radar - Mon, 2019/07/08 - 03:50

Algorithmic Governance, DevOps Assessment, Retro Language, and Open Source Satellite

  1. Algorithmic Governance and Political Legitimacy (American Affairs Journal) -- Mechanized judgment resembles liberal proceduralism. It relies on our habit of deference to rules, and our suspicion of visible, personified authority. But its effect is to erode precisely those pro­cedural liberties that are the great accomplishment of the liberal tradition, and to place authority beyond scrutiny. I mean “authori­ty” in the broadest sense, including our interactions with outsized commercial entities that play a quasi-governmental role in our lives. That is the first problem. A second problem is that decisions made by an algorithm are often not explainable, even by those who wrote the algorithm, and for that reason cannot win rational assent. This is the more fundamental problem posed by mechanized decision-making, as it touches on the basis of political legitimacy in any liberal regime.
  2. The 27-Factor Assessment Model for DevOps -- The factors are the cross-product of current best practices for three dimensions (people, process, and technology) with nine pillars (leadership, culture, app development/design, continuous integration, continuous testing, infrastructure on demand, continuous monitoring, continuous security, continuous delivery/deployment).
  3. Millfork -- a middle-level programming language targeting 6502- and Z80-based microcomputers and home consoles.
  4. FossaSat-1 (Hackaday) -- FossaSat-1 will provide free and open source IoT communications for the globe using inexpensive LoRa modules, where anyone will be able to communicate with a satellite using modules found online for under 5€ and basic wire mono-pole antennas.

Continue reading Four short links: 8 July 2019.

Categories: Technology

Four short links: 5 July 2019

O'Reilly Radar - Fri, 2019/07/05 - 06:10

Online Not All Bad, Emotional Space, Ted Chiang, Thread Summaries

  1. How a Video Game Community Filled My Nephew's Final Days with Joy (Guardian) -- you had a rough week. Treat yourself to this heart-warming story of people going the extra mile for someone.
  2. Self-Report Captures 27 Distinct Categories of Emotion Bridged by Continuous Gradients -- Although reported emotional experiences are represented within a semantic space best captured by categorical labels, the boundaries between categories of emotion are fuzzy rather than discrete. By analyzing the distribution of reported emotional states, we uncover gradients of emotion—from anxiety to fear to horror to disgust, calmness to aesthetic appreciation to awe, and others—that correspond to smooth variation in affective dimensions such as valence and dominance. Reported emotional states occupy a complex, high-dimensional categorical space. In addition, our library of videos and an interactive map of the emotional states they elicit are made available to advance the science of emotion. (via Dan Hon)
  3. Sci-Fi Author Ted Chiang on Our Relationship to Technology, Capitalism, and the Threat of Extinction (GQ) -- Right now I think we’re beginning to see a correction to the wild techno-boosterism that Silicon Valley has been selling us for the last couple decades, and that’s a good thing as far as I’m concerned. I wish we didn’t swing back and forth from the extremes of Pollyannaish optimism to dystopian pessimism; I’d prefer it if we had a more measured response throughout, but that doesn’t appear to be in our nature. +1 to this. I don't like the way we have spent 20 years imagining dystopias and then building them.
  4. Wikum -- Summarize large discussion threads.

Continue reading Four short links: 5 July 2019.

Categories: Technology

Four short links: 4 July 2019

O'Reilly Radar - Thu, 2019/07/04 - 06:50

Debugging AI, Serverless Foundations, YouTube Bans, and Pathological UI

  1. TensorWatch -- open source Microsoft, a debugging and visualization tool designed for data science, deep learning, and reinforcement learning.
  2. Formal Foundations of Serverless Computing -- the serverless computing abstraction exposes several low-level operational details that make it hard for programmers to write and reason about their code. This paper sheds light on this problem.
  3. YouTube Bans Videos Showing Hacking and Phishing (Kody) -- We made a video about launching fireworks over Wi-Fi for the 4th of July only to find out @YouTube gave us a strike because we teach about hacking, so we can't upload it. YouTube now bans: "Instructional hacking and phishing: Showing users how to bypass secure computer systems."
  4. User Inyerface -- an exercise in frustration.

Continue reading Four short links: 4 July 2019.

Categories: Technology

Tools for machine learning development

O'Reilly Radar - Wed, 2019/07/03 - 06:35

The O'Reilly Data Show: Ben Lorica chats with Jeff Meyerson of Software Engineering Daily about data engineering, data architecture and infrastructure, and machine learning.

In this week's episode of the Data Show, we're featuring an interview Data Show host Ben Lorica participated in for the Software Engineering Daily Podcast, where he was interviewed by Jeff Meyerson. Their conversation mainly centered around data engineering, data architecture and infrastructure, and machine learning (ML).

Continue reading Tools for machine learning development.

Categories: Technology

New live online training courses

O'Reilly Radar - Wed, 2019/07/03 - 04:20

Get hands-on training in TensorFlow, AI applications, critical thinking, Python, data engineering, and many other topics.

Learn new topics and refine your skills with more than 151 new live online training courses we opened up for July and August on the O'Reilly online learning platform.

AI and machine learning

Getting Started with Tensorflow.js, July 23

Building Intelligent Analytics Through Time Series Data, July 31

Natural Language Processing (NLP) from Scratch , August 5

Cloud Migration Strategy: Optimizing Future Operations with AI, August 7

Intermediate Natural Language Processing (NLP), August 12

Machine Learning for Business Analytics: A Deep Dive into Data with Python, August 19

Inside unsupervised learning: Semisupervised learning using autoencoders, August 20

TensorFlow 2.0 Essentials – What's New , August 23

A Practical Introduction to Machine Learning , August 26

Artificial Intelligence: Real-world Applications , August 26

Inside Unsupervised Learning: Generative Models and Recommender Systems, August 27

Hands-On Algorithmic Trading with Python, September 3

Artificial Intelligence: AI for Business, September 4

TensorFlow Extended: Data Validation and Transform, September 11


Introducing Blockchain, August 2


Building Your People Network, July 8

Getting Unstuck, August 5

How to Choose Your Cloud Provider, August 7

Spotlight on Data: Data Pipelines and Power Imbalances—3 Cautionary Tales with Catherine D’Ignazio and Lauren Klein, August 19

Salary Negotiation Fundamentals, August 20

Fundamentals of Cognitive Biases, August 20

Empathy at Work, August 20

Developing Your Coaching Skills, August 21

Applying Critical Thinking, August 22

Building Your People Network, August 27

60 Minutes to Designing a Better PowerPoint Slide , August 27

60 Minutes to a Better Prototype, August 27

Introduction to Critical Thinking, August 27

Spotlight on Learning from Failure: Fixing with Sha Hwang, August 27

Managing Your Manager, August 28

Scrum Master: Good to Great, August 29

Being a Successful Team Member, September 4

Fundamentals of Learning: Learn faster and better using neuroscience, September 5

Leadership Communication Skills for Managers, September 10

Getting S.M.A.R.T about Goals, September 10

Spotlight on Innovation: Enabling Growth Through Disruption with Scott Anthony, September 11

Writing User Stories, September 11

Data science and data tools

Applied Probability Theory from Scratch, July 17

Interactive Visualization Approaches, July 25

Apache Hadoop, Spark and Big Data Foundations , August 1

Visualizing Software Architecture with the C4 Model, August 2

Data Engineering for Data Scientists, August 6

Analyzing and Visualizing Data with Microsoft Power BI, August 9

Hands-on Introduction to Apache Hadoop and Spark Programming, August 12-13

Scalable Data Science with Apache Hadoop and Spark, August 19

IoT Fundamentals, August 20-21

Algorithmic Risk Management in Trading and Investing, August 23

Business Data Analytics Using Python, August 26

Python Data Science Full Throttle with Paul Deitel: Introductory Artificial Intelligence (AI), Big Data and Cloud Case Studies, August 26

Real-time Data Foundations: Flink, August 27

Managing Enterprise Data Strategies with Hadoop, Spark, and Kafka, August 29

Design and product management

Introduction to UI & UX design, August 28


Kotlin for Android, July 11-12

SQL for Any IT Professional, July 16

Design Patterns in Java, July 29-30

Discovering Modern Java, August 2

Essentials of JVM Threading, August 2

Getting Started with Pandas, August 6

Programming with Data: Foundations of Python and Pandas, August 12

Beginner’s Guide to Writing AWS Lambda Functions in Python, August 12

Solving Java Memory Leaks, August 12

Introduction to Python Programming , August 12

Working with Dataclasses in Python 3.7, August 15

Reactive Programming with Java Completable Futures, August 15

Getting Started with Python's Pytest, August 19

Visualization in Python with Matplotlib, August 19

Python Full Throttle with Paul Deitel: A One-Day, Fast-Paced, Code-Intensive Python, August 19

Oracle Java SE Programmer I Crash Course: Pass the 1Z0-815 or 1Z0-808 Exams, August 19-21

Linux Troubleshooting: Advanced Linux Techniques, August 20

Introduction to the Bash Shell, August 21

Getting Started with Node.js, August 21

Applied Cryptography with Python, August 22

Mentoring Technologists, August 22

CSS Layout Fundamentals: From Floats to Flexbox and CSS Grid, August 22

React Hooks in Action, August 23

Getting Started with Java: From Core Concepts to Real Code in 4 Hours, August 23

Bash Shell Scripting in 4 Hours, August 23

Continuous Delivery and Tooling in Go, August 26

Mastering SELinux, August 26

Functional Programming in Java, August 26-27

Scalable Concurrency with the Java Executor Framework, August 29

SOLID Principles of Object-Oriented and Agile Design, August 30

Fraud Analytics using Python, September 3

Getting Started with Spring and Spring Boot, September 3-4

Linear Algebra with Python: Essential Math for Data Science, September 5

Python-Powered Excel, September 9

Design Patterns Boot Camp , September 9-10

Secure JavaScript with Node.js, September 12


Introduction to Digital Forensics and Incident Response (DFIR), July 31

Cisco Security Certification Crash Course , August 16

Security Operation Center (SOC) Best Practices, August 19

Expert Transport Layer Security (TLS), August 20

CompTIA A+ Core 1 (220-1001) Certification Crash Course, August 21-22

Introduction to Ethical Hacking and Penetration Testing, August 22-23

CISSP Crash Course, August 27-28

CISSP Certification Practice Questions and Exam Strategies, August 28

Defensive Cybersecurity Fundamentals, August 29

Cybersecurity Offensive and Defensive Techniques in 3 Hours, August 30

Azure Security Fundamentals, September 4

Systems engineering and operations

DevOps on Google Cloud Platform (GCP), July 8

Getting Started with Microsoft Azure, July 12

Getting Started with Amazon Web Services (AWS), July 24-25

Ansible for Managing Network Devices, August 1

Software Architecture for Developers, August 1

Practical Software Design from Problem to Solution , August 1

Facebook Libra, August 1

Introducing Infrastructure as Code with Terraform, August 1

AWS CloudFormation Deep Dive, August 5-6

Rethinking REST: A hands-on guide to GraphQL and queryable APIs, August 6

Julia 1.0 Essentials, August 6

Getting Started with Serverless Architectures on Azure, August 8

Deploying Container-Based Microservices on AWS, August 12-13

AWS Access Management, August 13

Exam AZ-103: Microsoft Azure Administrator Crash Course, August 15-16

Architecture for Continuous Delivery, August 19

Getting Started with OpenStack, August 19

AWS Certified Big Data - Specialty Crash Course, August 19-20

Google Cloud Platform – Professional Cloud Developer Crash Course, August 19-20

CompTIA Network+ N10-007 Crash Course , August 19-21

Shaping and Communicating Architectural Decisions, August 20

AWS Certified Cloud Practitioner Exam Crash Course, August 20-21

Software Architecture Foundations: Characteristics and Tradeoffs, August 21

Google Cloud Platform Professional Cloud Architect Certification Crash Course, August 21-22

Red Hat RHEL 8 New Features, August 22

Introduction to Google Cloud Platform , August 22-23

Istio on Kubernetes: Enter the Service Mesh, August 27

AWS Monitoring Strategies, August 27-28

Red Hat Certified System Administrator (RHCSA) Crash Course, August 27-30

Azure Architecture: Best Practices, August 28

Web Performance in Practice, August 28

AWS Account Setup Best Practices , August 29

Getting Started with Amazon SageMaker on AWS, August 29

Jenkins 2: Beyond the Basics, September 3

Comparing Service-based Architectures , September 3

Microservice Collaboration, September 3

Introduction to Docker Compose, September 3

Chaos Engineering: Planning and Running Your First Game Day, September 3

Next-level Git: Master Your Workflow, September 4

Introduction to Knative, September 5

Reactive Spring and Spring Boot, September 9

Developing DApps with Ethereum, September 9

Building a Deployment Pipeline with Jenkins 2, September 9-10

Building Data APIs with GraphQL, September 11

Creating React Applications with GraphQL , September 12

Jenkins 2: Up and Running, September 12

Microservices Caching Strategies, September 12

Chaos Engineering: Planning, Designing, and Running Automated Chaos Experiments, September 12

Google Cloud Platform Security Fundamentals, September 12

Understanding AWS Cloud Compute Option, September 12-13

Google Cloud Certified Associate Cloud Engineer Crash Course, September 12-13

Continue reading New live online training courses.

Categories: Technology

Four short links: 3 July 2019

O'Reilly Radar - Wed, 2019/07/03 - 04:00

Models, More Models, Robots.txt, and Event Sourcing

  1. On Models (Tom Stafford) -- a Twitter thread where he lays out his work in models and the value of them.
  2. Why Model? -- The [article] distinguishes between explanation and prediction as modeling goals, and offers 16 reasons other than prediction to build a model. It also challenges the common assumption that scientific theories arise from and 'summarize' data, when often, theories precede and guide data collection; without theory, in other words, it is not clear what data to collect. Among other things, it also argues that the modeling enterprise enforces habits of mind essential to freedom.
  3. Robots.txt -- Google's robots.txt parser and matcher as a C++ library (compliant to C++11). Released as part of standardization work.
  4. Mistakes We Made Adopting Event Sourcing (And How We Recovered) -- a useful post for those also considering their first system built around events as the mechanism for changing state.

Continue reading Four short links: 3 July 2019.

Categories: Technology

Four short links: 2 July 2019

O'Reilly Radar - Tue, 2019/07/02 - 05:00

Lock Convoys, AI Hardware, Lambda Observability, and AI for Science

  1. The Convoy Phenomenon (Adrian Colyer) -- locks on resources that lead to performance degradation which never recovers, a situation first described in 1979.
  2. AI is Changing the Entire Nature of Compute (ZD) -- workloads have been doubling every 3.5 months, while our post-Moore's law chip speed increases have been 3.5% per year. What that means, both authors believe, is that the design of chips, their architecture, as it's known, has to change drastically in order to get more performance out of transistors that are not of themselves producing performance benefits. The article explores some of those directions.
  3. The Annoying State of Lambda Observability -- In the current state of the world, the available strategies boil down to either: (1) Send telemetry directly to external observability tools during Lambda execution. (2) Scrape or trigger off the telemetry sent to CloudWatch and X-Ray to populate external providers. Spoiler: neither option is ideal.
  4. Accelerating Science: A Computing Research Agenda -- I found this quite challenging at first because it seemed to be "cheating" somehow. But once I viewed it as the computer augmenting the human, not replacing them, then it was more acceptable. But I can imagine that better tools for each step of the scientific journey (e.g., Expressing, reasoning with, updating scientific arguments (along with supporting assumptions, facts, observations), including languages and inference techniques for managing multiple, often conflicting arguments, assessing the plausibility of arguments, their uncertainty and provenance) will create controversy no less than the software "proof" of the four-color theorem did.

Continue reading Four short links: 2 July 2019.

Categories: Technology

Four short links: 1 July 2019

O'Reilly Radar - Mon, 2019/07/01 - 05:30

General-Purpose Probabilistic Programming, Microsoft's Linux, Decolonizing Data, Testing Statistical Softwares

  1. Gen -- general-purpose probabilistic programming system with programmable inference. Julia package described as Gen's flexible modeling and inference programming capabilities unify symbolic, neural, probabilistic, and simulation-based approaches to modeling and inference, including causal modeling, symbolic programming, deep learning, hierarchical Bayesian modeling, graphics and physics engines, and planning and reinforcement learning..
  2. WSL2 Linux Kernel -- source for the Linux kernel used in Windows Subsystem for Linux 2 (WSL2).
  3. Decolonizing Data -- Decolonizing data means that the community itself is the one determining the information they want us to gather. Why are we gathering it? Who's interpreting it? And are we interpreting it in a way that truly serves our communities? Decolonizing data is about controlling our own story and making decisions based on what is best for our people. That hasn't been done in data before, and that's what's shifting and changing.
  4. Testing Statistical Software -- In this post, I describe how I evaluate the trustworthiness of a modeling package, and in particular what I want from the test suite. If you use statistical software, this post will help you evaluate whether a package is worth using. If you write statistical software, this post will help you confirm the correctness of the code that you write.

Continue reading Four short links: 1 July 2019.

Categories: Technology

RISELab’s AutoPandas hints at automation tech that will change the nature of software development

O'Reilly Radar - Mon, 2019/07/01 - 04:00

Neural-backed generators are a promising step toward practical program synthesis.

There's a lot of hype surrounding AI, but are companies actually beginning to use AI technologies? In a survey we released earlier this year, we found that more than 60% of respondents worked in organizations that planned to invest some of their IT budgets into AI. We also found that the level of investment depended on how much experience a company already had with AI technologies, with companies further along the maturity curve planning substantially higher investments. As far as current levels of adoption, the answer depended on the industry sector. We found that in several industries, 30% or more of respondents described their organizations as having a mature AI practice:

Figure 1. Stage of adoption of AI technologies (by industry). Image by Ben Lorica.

In which areas or domains are AI technologies being applied? As with any new technology, AI is used for a lot of R&D-related activity. But we are also beginning to see AI and machine learning gain traction in areas like customer service and IT. In a recent post, we outlined the many areas pertaining to customer experience, where AI-related technologies are beginning to make an impact. This includes things like data quality, personalization, customer service, and many other factors that impact customer experience.

Figure 2. Areas where AI is being applied (by stage of adoption). Image by Ben Lorica.

One area I’m particularly interested in is the application of AI and automation technologies in data science, data engineering, and software development. We’ve sketched some initial manifestations of “human in the loop” technologies in software development, where initial applications of machine learning are beginning to change how people build and manage software systems. Automation has also emerged as one of the hottest topics in data science and machine learning (AutoML), and teams of researchers and practitioners are actively building tools that can automate every stage of a machine learning pipeline.

For a typical data scientist, data engineer, or developer, there is an explosion of tools and APIs they now need to work with and “master.” A data scientist might need to know Python, pandas, numpy, scikit-learn, one or more deep learning frameworks, Apache Spark, and more. According to a recent blog post by Khaliq Gant, a web developer is typically expected to demonstrate competence in things like "navigating the terminal, HTML, CSS, JavaScript, cloud infrastructure, deployment strategies, databases, HTTP protocols, and that’s just the beginning." Data engineers additionally need to master several pieces of infrastructure.

How do data scientists, data engineers, and developers cope with this explosion of tools and APIs? They typically use search (Google) or post in forums (Stack Overflow, Slack, mailing lists). In both instances, it takes some baseline knowledge to both frame a question and to be able to discern which answer is the "best one" to choose. In the case of forums, there might be a significant delay before one obtains an adequate response. Those with more resources and more time to spare can avail of free or paid learning resources such as books, videos, or training courses.

There are emerging automation tools that can drastically increase the efficiency and productivity of software developers. At his recent keynote at the Artificial Intelligence conference in Beijing, professor Ion Stoica, director of UC Berkeley’s RISELab, unveiled a new research project that hints at a path forward for software developers. Their initial output is AutoPandas, a program synthesis engine for Pandas, one of the most widely used data science libraries today. As described in a paper from Microsoft and the University of Washington, program synthesis is a longstanding research area in computer science:

Program synthesis is the task of automatically finding a program in the underlying programming language that satisfies the user intent expressed in the form of some specification. Since the inception of AI in the 1950s, this problem has been considered the holy grail of computer science.

An AutoPandas user simply specifies an input and output data structure (i.e., dataframes), and AutoPandas automatically synthesizes an optimal program that produces the desired output from the given input. AutoPandas relies on "program generators" that capture the API constraints to reduce the search space (the space of possible programs is immense), neural network models to predict the arguments of the API calls, and the distributed computing framework Ray to scale up the search.

Figure 3. An AutoPandas user specifies an input and output data structure, and neural-backed generators output an optimal program. Image by Ben Lorica.

While we are still very much in the early days, neural-backed generators are an extremely promising step toward practical program synthesis. Note that while researchers at RISELab have initially focused on pandas, the techniques and tools behind AutoPandas can be applied to other APIs (e.g., numpy, TensorFlow, etc.). So, any number of popular tools used by developers, data scientists, or data engineers can benefit from some level of automation via program synthesis.

Programming tools have always changed over time (I no longer use Perl, for example), and there has always been an expectation that technologists should be able to adapt to the latest tools and methods. Continued progress in tools for program synthesis means that automation will change how data scientists, data engineers, or developers do their work. One can imagine a future where mastery of individual tools and APIs will matter less, and technologists can focus on architecture and building end-to-end systems and applications. As tools and APIs get easier to use, your employer won't care as much about what tools you know coming into a job, but they will expect you to possess "soft skills" (including skills that cannot be easily automated), domain knowledge and expertise, and the ability to think holistically.

Related content:

Continue reading RISELab’s AutoPandas hints at automation tech that will change the nature of software development.

Categories: Technology

Four short links: 28 June 2019

O'Reilly Radar - Fri, 2019/06/28 - 03:40

Heartbeat Identity, Seam Carving, Q&A Facilitation, and Secure Data in Distributed Systems

  1. The Pentagon Has a Laser That Can Identify People From a Distance By Their Heartbeat (MIT TR) -- A new device, developed for the Pentagon after U.S. Special Forces requested it, can identify people without seeing their faces: instead, it detects their unique cardiac signature with an infrared laser. While it works at 200 meters (219 yards), longer distances could be possible with a better laser. [...] It takes about 30 seconds to get a good return, so at present the device is only effective where the subject is sitting or standing.
  2. Real-world Dynamic Programming: Seam Carving -- nifty explanation of using dynamic programming (which has a reputation as a technique you learn in school, then only use to pass interviews at software companies) to implement intelligent image resizing.
  3. How to Facilitate Q&As (Eve Tuck) -- People don’t always bring their best selves to the Q&A—people can act out their own discomfort about the approach or the topic of the talk. We need to do better. I believe in heavily mediated Q&A sessions.
  4. Project Oak -- a specification and a reference implementation for the secure transfer, storage, and processing of data in distributed systems. From Google.

Continue reading Four short links: 28 June 2019.

Categories: Technology


Subscribe to LuftHans aggregator