期刊文献

Imputation of missing links and attributes in longitudinal social surveys 收藏

缺少纵向社会调查环节和属性的归责

原文求助发布源

作者

Vladimir Ouzienko [1] Zoran Obradovic [1]

作者单位

1. Center for Data Analytics and Biomedical Informatics, Temple University, Philadelphia, PA, USA

关键词

Imputation ;Temporal data analysis ;Social networks ;Exponential random graph models

关键词译文

插补;时间的数据分析;社会网络;指数随机图模型

页码

329-356

DOI

10.1007/s10994-013-5420-1

来源信息

Machine Learning ISSN：0885-6125, 2014年, 95卷, 3期, 329-356页

摘要

The predictive analysis of longitudinal social surveys is highly sensitive to the effects of missing data in temporal observations. Such high sensitivity to missing values raises the need for accurate data imputation, because without it a large fraction of collected data could not be used properly. Previous studies focused on the treatment of missing data in longitudinal social networks due to non-respondents and dealt with the problem largely by imputing missing links in isolation or analyzing the imputation effects on network statistics. We propose to account for changing network topology and interdependence between actors’ links and attributes to construct a unified approach for imputation of links and attributes in longitudinal social surveys. The new method, based on an exponential random graph model, is evaluated experimentally for five scenarios of missing data models utilizing synthetic and real life datasets with 20 %–60 % of nodes missing. The obtained results outperformed all alternatives, four of which were link imputation methods and two node attribute imputation methods. We further discuss the applicability and scalability of our approach to real life problems and compare our model with the latest advancements in the field. Our findings suggest that the proposed method can be used as a viable imputation tool in longitudinal studies.

摘要译文

纵向社会调查的预测分析是在颞观测丢失数据的影响高度敏感。这样的高灵敏度的缺失值提高了需要精确的数据插补，因为没有它采集的数据的一个大的部分不能适当地使用。失踪由于非受访纵向社交网络数据，在很大程度上由归咎于孤立缺失环节或分析网络数据归集效果处理的问题。考虑到不断变化的角色“链接之间的网络拓扑结构和相互依存和属性，构建一个统一的方法对链路归集和纵向社会调查属性。的新方法，是根据一个指数随机图模型，实验评估用于利用合成的和现实生活数据集节点丢失的20％-60％缺失数据模型五个场景。所得结果优于所有替换，其中四个环节估算方法和两个节点的属性估算法。我们进一步讨论的适用性和我们的方法的可扩展性，现实生活中的问题，我们的模型与最新进展的领域进行比较。我们的研究结果表明，该方法可作为在纵向研究一个可行的估算工具。

Vladimir Ouzienko [1] Zoran Obradovic [1]. Imputation of missing links and attributes in longitudinal social surveys[J]. Machine Learning, 2014,95(3): 329-356