Reproducing Popularity Distribution of YouTube Videos`
To provide video streaming of user generated contents (UGCs) with high quality and at low cost by maximizing the effect of content delivery network (CDN), CDN providers are required to adequately design CDN cache servers by accurately estimating the UGC view-count distribution. To achieve this goal in a practical time frame, we need to construct a simple timeseries model that captures the transition of UGC popularity. Therefore, in this paper, we first analyze the daily view count (DVC) of YouTube videos over nine months and find that the DVC of YouTube videos obeys a lognormal distribution. As a simple time-series model of the DVC of each YouTube video, we propose the grouped MPP (gMPP), extending the multiplicative process (MPP) which is widely known as a simple time-series model generating a lognormal distribution. We also propose reproducing the DVC distribution of YouTube videos by using a superposed gMPP (SgMPP) that aggregates multiple gMPPs. The SgMPP can accurately reproduce the DVC distribution of YouTube videos with a low computational overhead, so we can expect to use the SgMPP as the input for computer simulations for designing various network components that require the popularity distribution of UGC, e.g., cache capacities. Through numerical evaluation, we confirm that we can adequately design the storage capacity of a cache server with the average error rate of several percent against the target cache hit ratio.
popularity distribution, reproduction, multiplicative process