
Data science and machine learning books
Did you come here expecting to find the “Hundred-page machine learning book” or “Elements of statistical learning”?
There are various articles out there with recommendations for data science and machine learning books. So why would you need yet another summary of those books? I thought you wouldn’t. And therefore, this article is going to be a bit different.
In this article, I want to highlight the 5 books that expose the controversial policies and business models, as well as the surveillance abuses of companies that use big data, artificial intelligence, and data science at the core of their products.
As data scientists and machine learning engineers, we create products based on data, very often, people’s data. While some of these products can improve people’s lives tremendously, this is not always the case. Many companies have at the heart of their business interests, the monetization of people’s data. And when this is the main driver of business decisions and product development, many things can go wrong.
These are, in my opinion, the “best data science books”:
- Don’t be evil, by Rana Foroohar
- Weapons of math destruction by Cathy O’Neil
- The age of surveillance capitalism, by Shoshana Zuboff
- Algorithms of oppression, by Safiya Umoja Noble
- Stolen focus, by Johann Hari
No matter where you are in your data science career, whether you are a beginner or an advanced programmer, these books will make you question your relationship with technology, both in terms of the products you create and those you consume.
So, let’s dive in.
Don’t be evil
“Don’t be evil” explores how today’s most powerful companies are disrupting our economies, corrupting our political processes, and fogging our minds through their controversial policies, surveillance abuses, and dominance of market share.
Just of note is that:
- 80% of corporate wealth is held by 10% of the companies.
- 90% of online searches are conducted on Google.
- 98% of advertising dollars go to Google or Facebook.
Their dominance of market share and the lack of regulatory checks set up the perfect framework for technology companies to do pretty much what they please. And so they do.
This book covers the practices of the five big tech giants, commonly known as FAANGS, namely Facebook, Apple, Amazon, Netflix, and Google, and how they came to dominate their respective spaces in the technology industry.
It explores their exploitative data mining and algorithmic practices, including:
- digital surveillance and lack of privacy.
- spreading of misinformation and hate speech.
- predatory algorithms targeting the weak and vulnerable.
- products engineered to manipulate our desires.
“Don’t be evil” was written by the acclaimed Financial Times columnist and CNN analyst, Rana Foroohar. In this book, she tells us the true extent to which big tech companies like Google, Facebook, Apple, and Amazon are monetizing both our data and our attention, without us seeing a penny of those exorbitant profits.
You can find the book at these links:
Weapons of math destruction
“Weapons of Math Destruction” explores how practical applications of machine learning are increasingly used in ways that reinforce preexisting inequality. It explores how biases in machine learning models utilized in various fields, such as insurance, advertising, education, and policing, can lead to decisions that harm the poor, reinforce discrimination, and amplify inequality.
Algorithms learn from past data. And therefore, they propagate decisions made based on past patterns, which may or may not reflect current behavior or ideology. For example, if we create a linear regression or decision tree model to determine success in the workplace based on historical data, the model will end up discriminating against women because, in the past, women were less likely to obtain high-ranking positions.
The author argues that we need to be careful when we design predictive models and decide which datasets we use to train them. And also, we need to be careful when interpreting the results of supervised or unsupervised learning models. After all, “algorithms are opinions embedded in code.” Cathy O’Neil. And as with all opinions, they can be biased.
I found this book particularly interesting because it summarizes a series of real-world applications of machine learning projects and case studies, that follow along with the life and development of an individual; from the moment the person applies for a loan to cover university costs, to their admissibility to universities, to how they get a job, and then insurance for their products.
You can find the book at these links:
The age of surveillance capitalism
Surveillance capitalism is a new economic order that claims human experience as free raw material for hidden commercial extraction, prediction, and sales practices.
The surveillance capitalists would have us believe that their practices are the inevitable consequence of the outgrowth of digital technologies. If we want digital, so they claim, we need to go along with their surveillance practices.
Companies claim our private human experience as their free source of raw material. They put our experience through factories, the factories being fueled by computer science, artificial intelligence, neural networks, and reinforcement learning. And what those factories create are products that predict our behavior. These products are, however, not for us. They are sold to other businesses for their private profit. So surveillance capitalist companies harvest our human experiences to create products for the profit of others.
As an example of the misuse of computer vision and supervised learning, the author tells us how big companies have claimed their right to our faces through video cameras placed in the streets in the US. With our faces, they can create models using pattern recognition and machine learning, to identify our emotions. And by identifying how we are feeling, they can anticipate our behavior. With our private feelings in sight, they can then present us with products that we are more likely to buy.
More generally, this book covers the fundamentals of surveillance capitalism, how it works, how it is spreading, and how it affects our livelihoods as well as our democracies, offering an in-depth analysis of the new economic order that has come to dominate our societies.
A must-read for data scientists and machine learning engineers.
You can find the book at these links:
Algorithms of oppression
When people conduct searches online, most of them on Google, they think that the results they obtain are credible, fair, objective, and neutral. Yet, money made from advertising has a lot to say about the things that appear at the top of the search.
In fact, as I am writing this article, I am using a natural language processing (NLP) tool that helps me select the right keywords to make the article land on the first page of the search for “data science movies.”
But coming back to the book, “Algorithm of oppression” explains how private interests, along with the monopoly status of a relatively small number of internet search engines, lead to a biased set of search algorithms that privilege whiteness and discriminate against people of color.
The book exposes how machine learning techniques, including neural networks and deep learning, return biased products, i.e., search results, when their training is left unchecked. Or, in other words, when we lack algorithmic auditing.
As the author describes in her book, the optimization of search algorithms is done through a combination of natural language processing to find sets of keywords relevant for the search term and also the user’s click behavior. Thus, so the algorithm goes, if the user clicks on a link that is shown to them, the content must be relevant to what they are looking for. This is how, back in 2011, searches for “black girls” returned porn images. A (big) bunch of users searching for that term clicked on those images, and thus the algorithm thought the content was relevant to the terms in question.
The author discusses the need for search engines that are not driven by advertising and private profit, as a source of fair, neutral, and diverse results.
You can find the book at these links:
And since we are talking about search engines, I invite you to try Google alternatives:
Stolen Focus
The goal of the people who designed apps like Instagram and other social media platforms, is to get you to use them for as long as possible, and as often as possible. The more you use them, the more money they make. These machine learning systems are designed to hack your attention, to be able to sell it to advertisers.
In this book, New York Times best-selling author Johann Hari explores how social media interferes with our individual and collective ability to focus and the consequences it has on our lives and societies. And next, it explores ways in which we can reclaim our attention.
You can find the book at these links:
About “technical” machine learning books
If you came here expecting to find recommendations like “Machine learning for hackers,” “Elements of statistical learning” from Trevor Hastie and coworkers, or “Hands-on machine learning with Scikit-learn, Keras, and Tensorflow,” you probably did not get what you wanted. Don’t worry.
There are various articles out there with recommendations for data science and machine learning books.
Some of the books offer an introduction to machine learning algorithms and machine learning concepts. Some teach data science from scratch.
Some teach a programming language, including Python programming using numpy, pandas and matplotlib, and R programming, like those from Hadley Wickham. Some books cover exploratory data analysis and data visualization or discuss statistical concepts.
There are also good books about deep learning, like that by Ian Goodfellow, Yoshua Bengio and Aaron Courville.
You can find those books in these articles:
Data science and machine learning movies
If you are interested in this type of reading and would like to watch a set of related movies, check out our recommendations in the following article:
Sole is the lead instructor at Train in Data, which offers data science courses and machine learning tutorials. You can find more about Sole on LinkedIn.