In order to be as relevant as possible, the search engine considers the quality perceived by internet users when they visit websites. The wish of Google is above all to highlight quality content, in order to meet the expectations of the Internet user. From this observation, it is essential to study what pleases Google and analyze the pages found in the TOP of SERPs in order to extract the main lexical fields.
The results provided by the SERPs are indeed a wealth of information to be exploited. Since then, we have developed a crawler capable of extracting the textual content from the pages of websites. It is not an easy task and we have on this occasion understood the difficulties encountered by the engine when it consults a site in order to analyze it (bad encoding, invalid HTML tags, spam, etc.).
Step 1 - Semantic analysisTake the example of a site wishing to optimize its content on the keyword "insurance comparator". Step n ° 1, we launch the analysis on the keyword directly in web via the search bar:
The analysis is launched, it generally takes less than a minute.
Step n ° 2 - Wordprint studyWe invented a semantic concept called WordPrint . Wordprints are semantic SEO concepts specific to each of your keywords: it is the unique "DNA" of your keyword. It corresponds to Google's "expectations" in terms of lexical fields.
The WordPrint consists of a list of terms identified for the "insurance comparator" query, with the following two columns:
- Power : number of times the term was found in the corpus analysis, it is the frequency (quantitative aspect).
- Index : The index is based on the BM25 model , an advanced version of the TF * IDF. The essential lexies are highlighted (orange background). These terms were identified as ubiquitous in the analysis. Remember : the higher the BM25 value, the greater the lexia even if its frequency is low.