1) maximum matching method (from left to right direction);
3, based on statistical word segmentation
(2) reverse maximum matching method to the left direction from right);
1, based on the string segmentation, also called mechanical word segmentation method, the Chinese characters love Shanghai sequence and contrast scan mode according to the lexicon of different word segmentation, according to the scanning direction of different
now there are many webmaster use soft Wen to do network marketing and site optimization, so how do we make the search engine more in favor of our article, and how to combine the soft love Shanghai Chinese segmentation technology to get more traffic today, Zhang Dong for share with Shanghai Chinese love segmentation technology to create the best soft skills.
(4) at least to cut out the segmentation words in each sentence minimum);
3) bidirectional maximum matching method (from left to right, from right to left two scans)
2, based on the understanding of the word
in order to improve the accuracy of word segmentation, and the emergence of signs and characteristics of scanning. As for the word mark to mark the breakpoint, the original string is divided into smaller strings again into mechanical word segmentation and word segmentation; feature will help combine lexical category labeling, the segmentation decision by rich lexical category information, and in the process of marking in turn on the word results for inspection, adjustment, thus greatly improve the accuracy the rate of segmentation.
this segmentation method is adopted to make the computer simulation of sentence comprehension, word recognition results achieved. The semantic syntactic parsing in the word at the same time, to eliminate the ambiguity by using syntactic and semantic information.
love Shanghai Chinese segmentation technology is one of the core technologies of love Shanghai search engine algorithm, refers to a series of Chinese characters segmentation into a single word. The segmentation method mainly has the following several points:
1, a high degree of attention to the subject. Choose a high degree of attention to the topic, you can get more.
two, what is the quality of the soft
see from the form, the word is stable word combinations, so in the context more adjacent words appear at the same time, the more likely a word. Therefore the frequency or probability word co-occurrence can reflect the reliability of a word. Can the frequency combination of various word co-occurrence in the adjacent corpus statistics, calculate the mutual information of them. Mutual information is defined two words, two adjacent X, Chinese characters Y co-occurrence probability. Mutual information reflects the relationship between the Chinese characters closely degree. When the close degree higher than a certain threshold, can think the word group may constitute a word. This method only needs a statistical word frequency in the corpus, without segmentation dictionary, which is also called no dictionary lexical or statistical check method.
, a Shanghai Chinese love segmentation technology