Design and Implementation of Meteorological Big Data Platform Based on Hadoop and Elasticsearch
With the launching of high resolution meteorological satellites and the development of high spatial and temporal resolution numerical models, the types and amounts of various meteorological data are increasing year by year. The existing relational databases are no longer able to meet the business requirements of real-time or non-real-time data storage, processing and retrieval. The Hadoop ecosystem, combining with the Elasticsearch cluster (ES cluster) is used to build the meteorological big data platform. The real-time data is processed by Kafka message queue, combing with the Storm DataAnly topology and finally enters the ES cluster. The non-real-time data is mainly processed by the file monitoring component. The file metadata information such as indexes is stored in the ES cluster. The files are saved in the HDFS. The implemented Big Data platform can process about 1.5 million real-time and non-real-time meteorological data per day, while the Elasticsearch cluster can provide ultrafast searching at a speed level of millisecond in a dataset of 2.0million. Experiments show that the meteorological big data platform can meet the needs of modern meteorological business.
elasticsearch, hadoop, storm, meteorological data