For the Scientific American by Lauren Leffer
It may sound like fantasy or fiction, but people predict the future all the time. Real-world fortune tellers--we call them actuaries and meteorologists--have successfully used computer models for years. And today's accelerating advances in machine learning are quickly upgrading their digital crystal balls. Now a new artificial intelligence system that treats human lives like language may be able to competently guess whether you'll die within a certain period, among other life details, according to a recent study in Nature Computational Science.
The study team developed a machine-learning model called life2vec that can make general predictions about the details and course of people's life, such as forecasts related to death, international moves and personality traits. The model draws from data on millions of residents of Denmark, including details about birth dates, sex, employment, location and use of the country's universal health care system. The study metrics found the new model to be more than 78 percent accurate at predicting mortality in the research population over a four-year period, and it significantly outperformed other predictive methods such as an actuarial table and various machine-learning tools. In a separate test, life2vec also predicted whether people would move out of Denmark over the same period with about 73 percent accuracy, per one study metric. The researchers further used life2vec to predict people's self-reported responses to a personality questionnaire, and they found promising early signs that the model could connect personality traits with life events.
The study demonstrates an exciting new approach to predicting and analyzing the trajectory of people's life, says Matthew Salganik, a professor of sociology at Princeton University, who researches computational social science and authored the book Bit by Bit: Social Research in the Digital Age. The life2vec developers "use a very different style that, as far as I know, no one has used before," he says.
The new tool works in a peculiar way. There are lots of different types of machine-learning models that have different underlying architectures and are understood to be useful for different purposes. For example, there are models that help robots interpret camera inputs and others that help computers spit out images. Life2vec is based on the same type of architecture that underlies popular AI chatbots such as OpenAI's ChatGPT and Google's Bard. Specifically, the new predictive model is closest to BERT, a language model introduced by Google in 2018. "We took a principle that has been developed for language modeling ... and apply it to some really, really, really interesting sequence data about human beings," says study author Sune Lehmann, a professor of networks and complexity science at the Technical University of Denmark.
Given a chain of information, usually in the form of written text, these models make predictions by translating inputs into mathematical vectors and acting like a turbocharged autocomplete process that fills in the next section in accordance with learned patterns.
To get a language processing tool to make predictions about people's future, Lehmann and his colleagues processed individuals' data into unique time lines that were composed of events such as salary changes and hospitalizations--with specific events represented as digital "tokens" that the computer could recognize. Because their training data captures so much about people and their model architecture is so flexible, the researchers suggest life2vec could offer a foundation that could be easily tweaked and fine-tuned to offer predictions about many still-unexplored aspects of a human life.
Lehmann says medical professionals have already contacted him to ask for help in developing health-related versions of life2vec--including one that could help illuminate population-level risk factors for rare diseases, for example. He hopes to use the tool to detect previously unknown relationships between the world and human life outcomes, potentially exploring questions such as "How do your relationships impact your quality of life?" and "What are the most important factors in determining salary or early death?" The tool could also tease out hidden societal biases, such as unexpected links between a person's professional advancement and their age or country of origin.
For now, though, there are some serious limitations. Lehmann notes that the model's data are specific to Denmark. And many gaps remain in the information that was used. Though extensive, it doesn't capture everything relevant to a person's mortality risk or life trajectory, and Lehmann points out that some groups of people are less likely to have extensive health and employment records.
One of the biggest caveats is that the study's accuracy measures aren't necessarily robust. They're more proof of concept than they are proof that life2vec can correctly predict if a given person is going to die in a given time period, multiple sources say.
Looking at the study's statistical analyses, Christina Silcox, research director for digital health at the Duke-Margolis Center for Health Policy, says she wouldn't put too much stock in life2vec's individual four-year mortality predictions. "I would not quit my job and go to the Bahamas based on this," she says, noting that this isn't a critique of Lehmann and his co-authors' methods so much as an intrinsic limitation of the field of life outcome prediction.
It's difficult to know the best way to assess the accuracy of a tool like this because there's nothing else quite comparable out there, Salganik says. Individual mortality is especially hard to evaluate because, though everybody eventually dies, most young and middle-aged people survive year-to-year. Death is a relatively uncommon occurrence among the under-65 age cohort covered in the study. If you simply guess that everyone in a group of people between the ages of 35 and 65 living in Denmark (the study population) will survive year-to-year, you've already got a pretty accurate death forecast. Life2vec did perform significantly better than that null guess, according to the study, but Salganik says it's hard to determine exactly how well it does relative to reality.
Michael Ludkovski, a professor of statistics and applied probability at the University of California, Santa Barbara, agrees. "I have a hard time interpreting what the results really mean," he says. Most of his work has been in actuarial science, or the prediction of risk, and he says the life2vec results are "speaking in a language different from how actuaries talk." For instance, actuarial predictions assign a risk score, not a binary prediction of dead or not dead, Ludkovski says--and those risk scores account for uncertainty in a way that life2vec doesn't.
There are also major ethical considerations, Silcox notes. A tool like this could obviously cause harm if it were misapplied. Algorithmic bias is a real risk, and "AI tools need to be very specifically tested for the problem they're trying to solve," she says. It would be crucial to thoroughly assess life2vec for every new use and to constantly monitor for common flaws such as data drift--in which past conditions that were reflected in training data no longer apply (after important medical advances, for instance).
The researchers acknowledge they've waded into fraught territory. Their study emphasizes the fact that Denmark has strong privacy protections and antidiscrimination laws in place. Academics, government agencies and other researchers granted access to life2vec will have to ensure data aren't leaked or used for nonscientific purposes. Using life2vec "for automated individual decision-making, profiling or accessing individual-level data ... is strictly disallowed," the authors wrote in the paper. "Part of why I feel comfortable with this is that I trust the Danish government," Lehmann says. He would "not feel comfortable" developing such a model in the U.S., where there is no federal data privacy law.
Yet Lehmann adds that equivalently invasive and powerful machine-learning tools are likely already out there. Some of these tools even verge on the dystopian concept of "precrime" laid out in Philip K. Dick's 1956 novella The Minority Report (and the 2002 blockbuster science-fiction film based on it). In the U.S. many courts use algorithmic tools to make sentencing decisions. Law enforcement agencies use predictive policing software to decide how to distribute officers and resources. Even the Internal Revenue Service relies on machine learning to issue audits. In all of these examples, bias and inaccuracy have been recurring problems.
In the private sphere tech companies use advanced algorithmic predictions and the incredible amounts of data about users they collect to forecast consumer behavior and maximize engagement time. But the exact details of government and corporate tools alike are kept behind closed doors.
By creating a formidable AI predictive tool that is accessible to academic researchers, Lehmann says he hopes to promote transparency and understanding in the age of prediction that's already underway. "We can start talking about it, and we can start deciding how we want to use it: what's possible, what's right and what we should leave alone," he says.
"I hope," Lehmann adds, "that this can be part of a discussion that helps move us in the direction of utopia and away from dystopia."