Check contents statistic status
Checking the statistic status of entire website is useful to check the theme of it. The search engines know whole contents in your website, therefore you also have to have a method to do it in order to know the search engines' view.
There are no person who does not have habits. Therefore even if you write content naturally, you have possibility to get penalty or misunderstood by search engines.
The solution to avoid that is to have same view with the search engines. They are artificial intelligence which use statistic data.
The Content Management System has all content data, and it can analyse the keywords. Finally it can show following data.
By watching this data, you can find your habit. When you watch this data, you have to check following points.
- Most used keyword is not too much
- Important keywords are included in the upper rank
For old search engines, the density of keywords is significantly important factor, but for latest ones, it is not so important.
If you write content naturally, important keywords are included in this table, but it does not have to be top rated.
The more important thing is there are few too many keywords. It makes search engines confused or your site get penalty, which is called keywords staffing.
When you find too many keywords. You can check the pages where the keywords are used for each keyword.
Usually, keywords which is used too many times, are one or two. By editing the contents and reduce it, the keyword balance is normalized.
- Modify unnecessary subtitle
- Rewrite H2, H3 tag's text to remove unnecessary keywords
- Use "it", "them", "that" in the body sentences
In addition to that, this statistic data includes keywords in the template parts. Therefore modifying the template is effective way.
The theme is detected as collections of keywords. Search engine understands theme by it.
By using keywords balance's data, the Content Management System can calculate the theme.
The theme of the page is important metric to know how the search engines detect the theme. Actually, the theme of the page is shown like below.
The Content Management System manage where the keywords appear, and you can set coefficient for each position.
The Content Management System can recognize following positions.
- Web template's header and footer
- Aside and Navigation parts
- Article Body
- H1, H2 H3, and H4 tag's text
After analysing theme, you can check similar pages for each page. The similarity is calculated by using inner product of the theme vector. For more detail, cosign value of the argument of the normalized vector is similarity.
The 0 value means the themes have nothing to do, and 1.0 means perfectly same. Actual data is shown like below.
This data is similarity with a page which is "Control Design of your website with easy template engine".
According to my experience, if the similarity is not over 0.85, it has no problem. If the similarity is too high, the search engines will assume they are duplicated contents. Then I recommend you to the canonical meta tag.
The too high theme similarity means following things.
- The ratio of the keyword are similar
- The size of the contents are different, because of using normalized vector
- The similarity of the order of keywords has nothing to do with it
Therefore, the similarity itself can not detect the copy contents and compiled contents. But by using with the copy content detector, it helps to detect duplicated content which is compiled by computer program on purpose.
The search engine prefer well organized structure, therefore when the pages with similar theme is located in same page hierarchy, they appreciate that.
But the co-occurring words, which are used to appreciate quality of links by relevance for reader's demand, has nothing to do. Therefore setting links to relevant theme is not enough. You have to create a useful web page with good contents which answers user's demands correctly.
After you create a good contents and open the website, you have to check the access of crawlers and status of webmaster tools.
By checking them you can know how the search engines evaluate your site.
Please take a look at Check access and crawler logs, next.