Our life path may be influenced by family, society and times, and also changed by large and small decisions. Can AI accurately predict the future that we cannot predict?
160 research groups from Princeton University, UCLA, Massachusetts Institute of technology, Virginia Tech and other institutions participated in the “Fragile Families challenge”, trying to predict and measure the life trajectory of children, parents and their families across the United States and seek answers to the above questions by establishing statistical and machine learning models.
On March 30, 2020 local time, the research results of this challenge were published online in the Journal of the National Academy of Sciences. The paper is entitled measuring the predictability of life outcomes with a scientific mass collaboration. There are as many as 112 co authors.
Large sample data set from more than 4000 families
In fact, exploring the trajectory of life is more like a sociological problem. This is not just a simple prediction of the future. It has certain reference significance in providing family assistance, understanding the degree of social rigidity and improving relevant policies.
This study is based on a high-quality birth cohort data set called “Fragile Families and Child Wellbeing Study” – a large sample data set collected by social scientists over the past 15 years, including 13000 data sites from more than 4000 families.
The researchers studied the children born in big cities in the United States between 1998 and 2000, among which the children born to unmarried parents accounted for a large proportion. It is not difficult to see that the purpose of this longitudinal study is also to understand the lives of children born in unmarried families.
Specifically, with the growth of the child’s age, the relevant data include a total of six stages – when the child was just born, 1 year old, 3 years old, 5 years old, 9 years old and 15 years old.
It is worth mentioning that the range of data collection varies for each age group. For example, when a child is just born, only the parents’ survey and interview information is collected; When the child is 9 years old, the parents, the child’s main caregivers (if not the parents), teachers and the child themselves should be interviewed to collect information (as shown in the figure below).
In addition, researchers pay different attention to data for different age groups.
For example, when a child is just born, the interview with the mother mainly involves the child’s health and development, parental relationship, fatherhood, parents’ attitude towards marriage, relationship with their families, environmental and policy factors, health status, demographic characteristics, education level, employment and income; When the child is 9 years old, the main topics involved in the interview with the child are parent relationship, parents’ requirements and supervision of the child, relationship with brothers and sisters, daily life, school situation, juvenile delinquency tendency, task completion and behavior performance, health and safety status.
Vulnerable family challenge
In fact, this project called “fragile family challenge” can be said to be a game – the game developer is the project organizer, and the player is the research team involved.
The game setting is that the project organizers do not disclose the data of the children in the above families when they are 15 years old. Each research team can accurately predict the life development of the children when they are 15 years old by using any AI model high-energy play method. Developers provide players with six measurement dimensions, such as children’s average academic performance, children’s perseverance, family economic level, children’s main caregivers’ work and training, etc. players need to predict at least one of these dimensions.
As shown in the figure below, the background data of this study includes 12942 variables at birth, 1, 3, 5 and 9 years of age from 4242 families. The training data are the six life trajectories of children at the age of 15.
In fact, this game design idea is a common research and design method “common task method” in the field of computer science.
Lei Feng learned that the project organizers received 457 applications from 68 universities around the world. Finally, the project was coordinated by 160 research groups around the world. The challenge was launched from March 5, 2017 to August 1, 2017. Participants only need to upload the prediction results to the challenge’s official website.
Fragile family challenge official website
After the fragile family challenge, the organizers analyzed and compared the results of 160 teams, and found that each team used different data processing, statistical learning and other technologies to generate forecasts. Although there was little difference between the forecast results obtained by each team, there was a large gap between the forecast results and the real situation. Even the most accurate forecast results were still far from the actual situation of the training data.
As shown in the figure below, the average prediction accuracy of the research team for family economic level and children’s average academic performance fluctuates around 0.2, while the average prediction accuracy for other dimensions is about 0.05 (Lei Feng’s note: the closer the value is to 1, the higher the coincidence degree is; the closer it is to 0, the lower the coincidence degree is).
Of course, it is undeniable that some of the predictions made by the teams are also accurate, such as the academic performance of a specific child.
At present, we can answer the question at the beginning of the article – AI can not accurately predict the trajectory of life.
This conclusion has brought enlightenment to the application of AI models in criminal justice, child protection services and other scenarios. At the same time, sociologists and data scientists should also carefully use AI prediction models in the future. As Sara McLanahan, lead researcher of the “fragile family and child health research” at Princeton University and Columbia University, said: the results are eye opening. Either luck plays an important role in our lives, or we, as social scientists, ignore some important variables in our research.
Responsible editor: WV