Research on Network Traffic Modeling and Applications

With the popularity of internet and the growing of applications in recent years, network traffic characteristics are also undergone a great change. The traditional flow models are unable to meet the current traffic. Therefore, it is done to study the law of current traffic model, to propose these models of service traffic characteristics and to explore the applications of these models in practice.


Introduction
In recent years, internet obtained rapid development; it has changing people's daily lives.It always keeps changing from service content to the service form of network services, from simple text, images to complex audio, video; terminal from an ordinary PC to mobile phones, tablet PCs and even televisions, refrigerators and other traditional household appliances.As a result, the size of internet is becoming increasingly large, the structure has become increasingly complex, the traffic has become more and more huge, and the characteristics make the traffic modeling, analysis and control to become very important and extremely difficult.
The internet is essentially an open, heterogeneous computer networks, it connects a variety of computer networks and systems together that be distributed around the world through the TCP / IP protocol.In recent years, many research indicate that the internet service flow is a self-similar process [1] ,it has Long-range correlation characteristics [2] , the flow of the service in the WAN shows multiple fractal characteristics [3] [4] .
The self-similarity of network traffic can produce many adverse effects to network performance.The self-similarity is stronger, the average queue length of the output buffer of network switching node is longer, the average delay of the network is bigger; and the convergence of service flow will increase their sudden.It directly leads to reduce bandwidth utilization in output node, and have a bad effect on service quality (including delay, jitter, and packet loss ratio).Poisson model in the telephone network don't not considered generally self-similarity that was successfully used for many years, and thus find to describe traffic model of self-similarity characteristic has become an important research problem.
Self-similarity of traffic is usually described using the Hurst parameter, as follows: (1), when the Hurst parameter is between 0.5 to 1, the service flow has obvious self-similarity, and the Hurst parameter is larger, the degree of similarity is higher; (2), when H = 1/2, the random process is not self-correlation, that it does not affect the future, such as the common white noise sequences.(3), when 0 <H <1/2, the random process exists only short-range correlation.
It can be seen from above analysis that Hurst parameter can characterize the self-similarity of the random process.In the paper, flow sequence is mapped to a random series; the research on the characteristics of flow time series is mainly to estimate the Hurst parameter.

Traffic Modeling
Analysis of Data Source.In order to capture network traffic at different time and location , we tested operation broadband network to collect more than 5 million valid data which include DNS, VoIP, Web, stream media, game, and ftp.
In order to estimate self-similarity Hurst parameter of a variety of services such as stream media based on P2P technology, online game, VoIP, web browsing, we select the following six representative data sets from repeatedly collected data, as follows: Self-Similarity of Traffic analysis.Based on MATLAB platform, self-similarity Hurst parameter of streaming media service, VoIP service, online games service and Web browser service is estimated respectively by absolute moment method, Aggregated Variance method [5] , Modified Period gram method [6] , Higuch method [7] , Differential variance method [5] ,Variance of Residuals method [8] , and R/S method [9] based on six datasets.The average Hurst parameter of each service traffic is calculated apart, as shown in Table 2.
The self-similarity Hurst parameter for the different service flow is different that can be found from Table 2, the Hurst parameter is higher for Web service, VoIP services, and game service respectively, while the Hurst parameter of the stream media service based on P2P technology is lower.The reasons analyzed as the following: (1) Hurst parameter is related to the operating mechanism of a protocol which supports this service, (2) Hurst parameter of a service is related to the user's habits that are network behaviors, (3) From the form of expression, and the scale of the observed time scales is smaller, the self-similarity is stronger usually.

Analysis of Traffic Models Model Fit Test
In this paper, the Kolmogorov-Smirnov test method is selected, it is more accurater than 2 χ [10] , It can test that the empirical distribution is subject to distribution of the theory, but also can test whether two samples is from the same population.
In MATLAB toolbox, K-S test has the parameter H, P, K, the CV, for example, H is the results of the KS test, H = 1 implies to deny the assumption, H = 0 represents to accept the assumptions; P is a P-value of the KS test, that ,randomized trial results is greater than the probability of the sample statistics; K is statistic calculated results of KS test; the CV is corresponds to a significant level of alpha sub-sites (Kolmogorov-Smirnov critical value), the default significance level a is 5%.

Stream Media Service
Streaming media services is from network TV, also known as IPTV, Overall, network television according to the terminal is divided into three forms, namely the PC platform, TV (STB) platform and the mobile phone platform (mobile network).Variety of network TV software based on P2P technology include PPLive, PPStream, FeidianTV, etc. Different network TV software uses different technology of transport layer, and some service use the TCP protocol to establish a signaling connection, using the UDP protocol for media transmission; some service use the UDP protocol to establish the signaling connection, use the TCP protocol for media transmission, while others is mix, detailed analysis is in Table 1.Analysis results of lots of typical traffic samples show that the different service characteristics lead up to different types traffic, there are great difference in the probability density distribution characteristics, but also self-similarity level is different.The probability density of the traffic is various which traffic generated by the same type service.From the analysis results of the sample, the traffic generated by the games service and Web browsing has a stronger self-similarity; self-similarity of the traffic generated by the media service is relatively weak or even no self-similarity.

Traffic Model Application
It is the key to network traffic model from the laboratory to industrial applications to be used in the actual network design and network equipment design.At present, the application of network flow model includes the following aspects: Cache Design.When the rate of the device port is determined in the design of the network equipment (for example, is set to 100Mbps or 1Gbps), network traffic characteristics become a decisive factor to determine the device cache size.In such applications, the model of network traffic is basis as the computing device cache size required.
For example, we use multi-scale queuing (MSQ, Multi scale Queue) model [11] [12] Proposed by Rudolf.H Riedi et al. to estimate the size of the cache, a brief description of such applications for the following: According to this model, the cache size that network equipment ports whose maximum rate is c need can be estimated by the following formula: In the above formula, Q is the queue size, b is the cache size, n is a positive integer, r k is the arriving traffic during the time -r+1 to 0, ] [ b Q P > represents the congestion probability when the queue size is Q, cache size is b.From Eq.2, when the probability density distribution of network traffic is determined, the cache size b can be determined based on the congestion probability which the design permits.Vice versa, it can estimate the congestion probability according to the cache size designed.The probability density distribution which required in the estimate process can be obtained from the network flow model which has been established.Link bandwidth design.At present, the network operators take link flow mean measured 2-3 times as link bandwidth, and expect to protect the QoS metrics of traffic during peak periods by providing a well-off bandwidth.
The advantage of this method is simple.The disadvantage is that usually bring two negative consequences: First, providing redundant bandwidth is too much.Although the bandwidth is redundancy, it can't meet the need during the flow peak to happen congestion.Anomaly Detection.Currently, the network traffic anomaly detection technology is the primary technology means to solve the problem of abnormal network traffic.In order to identify abnormal traffic, it is necessary to collect and analyze network traffic under normal conditions, finish the normal network traffic modeling based on time.When the network anomaly traffic detection system is online, it creates a dynamic flow benchmark on the monitored network traffic for each time period, variety of protocols in the system database.If a modeling of traffic generated by a protocol does not match its current base on particular time, it will be given an exception alarm, and the alarm will escalate over time.Then, it will take targeted measures such as tracking of abnormal traffic, finding an abnormal source and filtering traffic on the basis of analyzing the alarm, to achieve the protection of the network.

Fig. 11 andFig
Fig.11and Fig.12represent respectively the CDF curve and PDF curve of the flow sample.it can be seen from Figure2-5b, most of the time, the flow rate is concentrated in the 10Kbps following such low speed range, high-speed transmission continues only a small period of time.It is correspond to the characteristics of typical Web browsing.Self-similarity Hurst parameter is about 0.839, the sample of flow has a strong self-similarity.As follows:

Table 1 :
Data Source Analysis

Table 2
Hurst parameter of traffic