A Robust Webpage Information Hiding Method Based on the Slash of Tag

Nowadays,the information hiding technology is a hot spot in the field of information security, and is applied in many fields,such as digital multimedia copyright protection and secret communication. According to the analysis of the characteristics of browser in parsing HTML of the webpage and the little capacity available for information hided in webpage, a new robust webpage information hiding method with the slash of tag attributes has been proposed in this paper, which overcomes the shortcoming of the ability of imperceptibility and the ability of contradict with the machine filtration of traditional webpage information hiding algorithms and has greater embedded capacity than some other algorithm based on tag attributes. This method has good performances in invisibility and higher applied value as proved by the experiments.


Introduction
Information hiding [1] is to hide some secret information in innocuous-looking cover objects, such as audios, images, videos, texts, etc..In recent years, Information hiding has generated significant research and commercial interest.The primary factors contributing to this surge are widespread use of the Internet with improved bandwidth and speed, regional copyright loopholes in terms of legislation; and seamless distribution of multimedia content due to peer-to-peer file-sharing applications.
HTML is a hypertext markup language for writing hypertext files, namely webpages, which are used to convey information through the Internet.With the development of the Internet as a main communicative means, webpages have enjoyed an extensive application in the Internet.Meanwhile, a wide variety of steganographic methods [2][3][4] for webpages have emerged.According to the analysis of the characteristics of browser in parsing HTML of the webpage, the source codes of a webpage are a plain text that contains small markup tags, by which the web browser is instructed how to display the page.Information hiding based on webpage uses a webpage as a cover, and then embeds some secret information into the source codes of the webpage, while the displaying effect will remain unchanged.Through analyzing the criterion of HTML and References [2][3][4][5], we have defined three main information hiding methods: 1) Based on the invisible characters embedding; 2) Based on the changing of letter upper and lower cases in tags; 3) Based on the changing the order of attribute tags pair.
The first two methods are information hiding methods based on document format.The method of embedding invisible characters is to embed extra invisible characters between tags, or after every row, or after the whole document, to encode secret information.The second method is based on the fact that letters in tags are always case-insensitive, therefore the cases of tag letters can be modified without changing the visible document or the file size.So, define the uppercase letter as the bit "0" and the lowercase letter as "1", secret information can be embedded into a webpage by changing of the letters upper and lower cases in tags.
A new robust webpage information hiding method with the slash of tag attributes has been proposed in this paper, which overcomes the shortcoming of the ability of imperceptibility and the ability of contradict with the machine filtration of traditional webpage information hiding algorithms and improves the embedded capacity of the other algorithm based on tag attributes.According to the embedded rule, firstly the sequenced tags entity set is acquired from the webpage.Then the message is encrypted by a two-value chaotic sequence generated by Logistic map system.The value format of a certain attribute in tags is selected and a modification is made to them based on the encrypted message, which is whether it has single quotation mark.The analysis shows that the method has good imperceptibility and perfect security than the traditional method.And the embedded capacity of the method gets better increase than the method based on the attributes of tags.So the method could be used to protect the content of a webpage and covert communication.

The Method
The Proposed Scheme.In this section, we will present the proposed information hiding scheme based on the slash of webpage tags.The process diagram of the scheme, which is composed of the information embedding process and the information extraction process, is shown in Fig. 1.The embedding process is used to hide the secret data in the cover webpage, while the extraction process is used to extract the secret data from the hidden webpage.

The Related Theoretics.
Definition 1.Let T=<a 1 , a 2 , …, a n > be a tag with n attributes in HTML, where T is the name of the tag and a i (1≤i≤n), whose general form is "attribute name=attribute value"(short for name=value), is the i-th attribute of the tag.And let Ts be a single tag without end tags in HTML, meanwhile let Td be a double tag with starting and ending Tags.The starting component of any tag is the tag name and its attributes, if any.The corresponding ending tag is the tag name alone, preceded by a slash (/).Ending tags have no attributes.
Definition 2. Let |W| be number of webpage tags, where W is a webpage.And let |T i | be a number of attributes of the i-th webpage tag, where T i is the i-th tag in the webpage.Definition 3. Let O be a object that is composed of a attribute and a value of the attribute in a tag, where O i is the i-th object in the tag.For example, the "size=21px" is O 1 and the "color=green" is O 2 in the tag "<font size=21px color=green>".Definition 4. Let T and T' be a pair of equal tag object, where T is a tag object without slash marks, T' is a tag object with a slash marks.For example, the T is "<font size=5 >", the T' is "<font size=5 />".That is T  T'.
By studying thoroughly, we found that the view results do not occur any change between the original webpages and the modified webpages using the equal attribute object in the browser.
Property 1. Equal attribute object has the identical function.If a tag object T is a Ts tag or the starting component of any Td tag such that the "<br>" tag is a Ts tag and the "<font size=4>" tag is the starting component of Td tag "<font size=4> </font>", the T meets the insertion requirement; Otherwise, the T does not meet the insertion requirement.
Rule 2. Extraction decision rule.If a tag object T' with the slash marks is a Ts tag or the starting component with the slash marks of any Td tag such that the "<br />" tag is a Ts tag and the "<font size=4 />" tag is the starting component of Td tag "<font size=4> </font>",, the T' meets the extraction requirement; Otherwise, the T' does not meet the extraction requirement.
Step 1: Step 2: If the T i meets the insertion requirement of Rule 1, then go to Step 4.
Step 3: Let i = i + 1.If the i ≤ |W|, then go to Step 2. Otherwise, go to Step 5.
Step 4: The Step 4: We extract a secret information bit, and let i = i + 1.If the i ≤ |W|, then go to Step 2.
Step 5: Finished.The Hiding Process.Let W={ T 1 , T 2 , …, T n } be a cover webpage, where T i is a tag in the webpage.And let M={ m 1 , m 2 , …, m n } be the secret data bits to be embedded in the cover webpage.In order to increase the secrecy of the proposed scheme, we generate a chaotic sequence L={ l 1 , l 2 , …, l n } , accompanied by a secret key to manipulate it, by the Logistic map system.Then the secret data bits is calculated by using the Eq.1.
S=ML={ s 1 , s 2 , …, s n }={ m 1  l 1 , m 2  l 2 , …, m n  l n } (1) We use the S={ s 1 , s 2 , …, s n } to determine whether the tag can be used to hide information or not according to the Rule 3.
The hiding process can be described as follows: Step 1: Calculate the M form the secret data and generate the L by the Logistic map system and the secret key K, then Calculate the S from the M and the L.
Step 2: According to the Rule 1, check every tag object T i of the webpage to determine whether the tag object T i can be used to hide information or not.In our new method, if there is a equal tag object ' i T in the tag T i , then the tag T i is called embeddable tag.
Step 3: For the embeddable tag object, if the secret data bit of the S is 1, then replace the T with the T' for information hiding according to the value of the secret date bit.Otherwise, if the secret data bit is 0, the scheme retains the original tag object.
The payload capacity of the proposed scheme is given by Eq.2 The Extraction Process.In this subsection, we shall describe the extraction process.The following extraction procedure is used to extract the embedded secret data.The extraction process can be described as follows: Step 1: According to the Rule 2, check every tag object T i of the webpage to determine whether the tag object T i has been used to hide information or not.In our new method, if there exists the tag object ' i T , then the hidden secret data bit s i is 1; otherwise, the hidden secret bit s i is 0.

Advanced Engineering Forum Vols. 6-7 363
Step 2: Since the receiver owns the secret key used to generate the chaotic sequence L by the Logistic map system, the original secret data can be calculated by using the Eq.3.

The Experiments
The experiments were carried out to evaluate the performance of the proposed information hiding scheme based on the slashs of tag in the webpage.The proposed scheme was tested on Win 7 personal computer with a Pentium IV 2.66GHz and 4G RAM.And six homepages of pop website were used as the cover webpage.The Experimental Results.We have implemented the proposed information hiding method in the Visual C++ 6.0 environment.The experiment result shows that the view results did not occur any change between the original webpages and the modified webpages using the equivalent tag object in the browser.Fig. 2 and Fig. 3 show the embedded secret data before and after webpage renderings.Fig. 4 and Fig. 5 show the source screenshots of the embedded secret data before and after webpage.At the same time, the homepage on the popular website has been tested for the maximum hidden amount of webpage.Table 1 shows the largest embedded capacity, which is called LEC for short, of the homepages on some popular websites which were visited on June 5, 2012.And the experimental results show that the method has good imperceptibility and perfect security than the method which was proposed in [6].

Conclusions and future works
Information hiding technology is a hot spot in information security, and is applied in the fields of digital multimedia copyright protection and secret communication.According to the analysis of the characteristics of browser in parsing HTML of the webpage and the little capacity available for information hided in webpage, a new efficient webpage information hiding method with equal tag has been proposed in this paper, which overcomes the shortcoming of the ability of imperceptibility and the ability of contradict with the machine filtration of traditional webpage information hiding algorithms and improves the embedded capacity of the other algorithm based on tag attributes.This method has good performances in invisibility and higher applied value as proved by the experiments.So we conclude that the proposed method is practical in many real applications.The next work is to study how to improves the embedded capacity and security capability of the method by using the relative links of the webpages and multi-webpage embedment technology or other ones.

Fig 1 .Rule 1 .
Fig 1.The Process Diagram of the Proposed Scheme

iTStep 1 :Step 2 : 3 :
is modified to the ' i T , and let i = i + 1.If the i ≤ |W|, then go to Step 2. Step 5: Finished.Rule 4. Extraction rule.Let i = 1, where 1 ≤ i ≤   If the T i meets the extraction requirement of Rule 2, then go to Step 4. Step Let i = i + 1.If the i ≤ |W|, then go to Step 2. Otherwise, go to Step 5.