Predicting Socioeconomic Inequality Using Machine Learning: A Study on Multi-Source Big Data Fusion Model

Authors

  • Guoce Shui

DOI:

https://doi.org/10.6918/IJOSSER.202511_8(11).0006

Keywords:

Socioeconomic Inequality, Multi-Source Data, Attention-LSTM, Gini Coefficient

Abstract

Socioeconomic inequality hinders sustainable development, and its prediction is of great significance for formulating governance policies. Current studies have limitations such as single-source data and models that struggle to capture temporal dependencies. This study focuses on county-level administrative units, integrates four types of data—non-target government affairs data, social media data, remote sensing data, and mobile payment data—to construct a dataset with 776 valid records, using the county-level Gini coefficient as the target variable. After data preprocessing and feature selection, an Attention-LSTM model is built for prediction. Experimental results show that the Attention-LSTM model performs well, with a Root Mean Square Error (RMSE) of 0.021, a Mean Absolute Percentage Error (MAPE) of only 7.7131%, and a coefficient of determination (R²) of 0.8497 on the test set, which is significantly better than traditional machine learning models such as XGBoost, LSTM, and Random Forest (RF). The multi-source data fusion framework and Attention-LSTM model constructed in this study provide a technical means for monitoring socioeconomic inequality at the county level, helping policymakers identify key intervention areas and formulate differentiated governance policies.

Downloads

Download data is not yet available.

References

[1] Arya P K, Sur K, Dhote S, et al. Integrating Multi-Source Satellite Imagery and Socio-Economic Household Data for Wealth-Based Poverty Assessment of India: A GIS and Machine Learning Based Approach: Arya et al[J]. Social Indicators Research, 2025: 1-24.

[2] Niu T, Chen Y, Yuan Y. Measuring urban poverty using multi-source data and a random forest algorithm: A case study in Guangzhou[J]. Sustainable Cities and Society, 2020, 54: 102014.

[3] Zhao X, Yu B, Liu Y, et al. Estimation of poverty using random forest regression with multi-source data: A case study in Bangladesh[J]. Remote Sensing, 2019, 11(4): 375.

[4] Ji J. Machine Learning-Based Income Inequality Prediction: A Case Study[C]//Proceedings of the 2024 2nd International Conference on Artificial Intelligence, Systems and Network Security. 2024: 34-39.

[5] Fan C, Xu J, Natarajan B Y, et al. Interpretable machine learning learns complex interactions of urban features to understand socio‐economic inequality[J]. Computer‐Aided Civil and Infrastructure Engineering, 2023, 38(14): 2013-2029.

[6] Reza S A, Rahman M K, Hossain M S, et al. AI-Driven Socioeconomic Modeling: Income Prediction and Disparity Detection Among US Citizens Using Machine Learning[J]. Advances in Consumer Research, 2025, 2(4).

[7] Pradhan N, Agrawal A. Mapping fine-scale socioeconomic inequality using machine learning and remotely sensed data[J]. PNAS nexus, 2025, 4(2): pgaf040.

Downloads

Published

2025-10-30

Issue

Section

Articles

How to Cite

Shui, G. (2025). Predicting Socioeconomic Inequality Using Machine Learning: A Study on Multi-Source Big Data Fusion Model. International Journal of Social Science and Education Research, 8(11), 40-48. https://doi.org/10.6918/IJOSSER.202511_8(11).0006