An Empirical Study of Imputation Methods for Univariate Time Series

Received: 20-07-2020

Accepted: 10-09-2020

DOI:

Views

3

Downloads

1

Section:

KỸ THUẬT VÀ CÔNG NGHỆ

How to Cite:

Hong, P. (2024). An Empirical Study of Imputation Methods for Univariate Time Series. Vietnam Journal of Agricultural Sciences, 19(4), 452–461. http://testtapchi.vnua.edu.vn/index.php/vjasvn/article/view/811

An Empirical Study of Imputation Methods for Univariate Time Series

Phan Thi Thu Hong (*) 1

  • 1 Khoa Công nghệ thông tin, Học viện Nông nghiệp Việt Nam
  • Keywords

    Univariate time series, missing data, imputation, similarity

    Abstract


    Time series with missing values occur in almost areas of applied science. Ignoring missing values can lead to a reduction of system performance and unreliable results, especially in case of large missing values. Therefore, handling missing data is an important task to effectively perform further purposes such as classification, data analysis, etc. This article aims first to introduce approaches for dealing with missing data. Next a framework is built to fill the incomplete data in univariate time series and then to compare the performance of various imputation methods. Four indices are used to evaluate the ability of imputation methods on 3 different real-time data series. Through experimental results, the DTWBI and eDTWBI methods achieve better results with data having seasonality component and without trend factor, while na.interp is more superior as the data have both seasonality and trend components.

    References

    Allison P.D. (2001).Missing Data, Quantitative Applications in the Social Sciences, 136.Sage Publication.

    Buuren S. &Groothuis-Oudshoorn K. (2011).Mice: Multivariate imputation by chained equations in R. Journal of statistical software.45(3).

    Bishop C.M. (2006).Pattern Recognition and Machine Learning (Information Science and Statistics).Springer-Verlag New York, Inc., Secaucus, NJ, USA.

    Chan K.S. & Ripley B. (2020). TSA: Time Series Analysis. R package version 1.3. Retrieved from https://CRAN.R-project.org/package=TSA, on March10, 2020.

    Crawford S.L., Tennstedt S.L. & McKinlay J.B. (1995). A comparison of anlaytic methods for non-random missingness of outcome data. J. Clin. Epidemiol. 48(2): 209-219.

    Dong Y. & Peng J. (2013). Principled missing data methods for researchers. SpringerPlus. 2: 222.

    Gelman A. & Hill J. (2006). Data Analysis Using Regression and Multilevel/Hierarchical Models, Cambridge University Press.

    Ghosh S. & Pahwa P. (2008).Assessing bias associated with missing data from joint Canada/U.S. survey of health: An application, JSM Biometrics.

    Horton N.J. & Kleinman K.P. (2007). Much Ado About Nothing: A Comparison of MissingData Methods and Software to Fit Incomplete Data Regression Models.American Statistical Association.61. 79-90.

    Hyndman R. & Khandakar Y. (2008). Automatic time series forecasting: the forecast package for R., used package in 2020. J. Stat. Softw. pp. 1-22.

    Little R.J.A. & RubinD.B. (2014). Statistical Analysis with Missing Data. John Wiley & Sons. Google-Books-ID: AyVeBAAAQBAJ.

    MoritzS., SardáA., Bartz-BeielsteinT., ZaeffererM. &Stork J. (2015). Comparison of different Methods for Univariate Time Series Imputation in R. arXivpreprint arXiv:1510.03924.

    MolenberghsG., FitzmauriceG., KenwardM.G., VerbekeG. &Tsiatis A. (2014). Handbook of missing data methodology. CRC Press.

    PhanT.T.H., CaillaultE.P. &Bigand A. (2016).Comparative study on supervised learning methods for identifying phytoplankton species, in 2016 IEEE Sixth International Conference on Communications and Electronics (ICCE).pp. 283-288, doi: 10.1109/CCE.2016.7562650.

    PhanT.T.H., Poisson CaillaultE., LefebvreA.&Bigand A. (2017).Dynamic Time Warping-based imputation for univariate time series data, Pattern Recognition Letters.

    Rousseeuw K., Caillault ÉP., Lefebvre A. & Hamad D. (2013). Monitoring system of phytoplankton blooms by using unsupervised classifier and time modeling. In 2013 IEEE International Geoscience and Remote Sensing Symposium - IGARSS.pp. 3962-3965.

    Stekhoven D.J. &Bühlmann P. (2012).MissForest-non-parametric missing value imputation for mixed-type data. Bioinformatics.28(1):112-118.

    Sterne J.A.C., White I.R., Carlin J.B., Spratt M., Royston P., Kenward M.G., Wood A.M. & Carpenter J.R. (2009). Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls.BMJ (Clin. Resear. ed.).

    Sakoe H. &Chiba S. (1978).DynamicProgrammingAlgorithmOptimizationforSpokenWordRecognition.IEEETransactionsOnAcoustics,Speech,AndSignalProcessing. 16:43-49.

    Zeileis A. &Gabor Grothendieck (2005).Zoo: S3 infrastructure for regular and irregular time series. Journal of Statistical Software.14(6):1-27.