search - Solr stopwords magic -


my stopwords don't works expected. here part of schema:

<fieldtype name="text_general" class="solr.textfield">     <analyzer type="index">         <tokenizer class="solr.keywordtokenizerfactory"/>         <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/>         <filter class="solr.lowercasefilterfactory"/>     </analyzer>     <analyzer type="query">         <tokenizer class="solr.keywordtokenizerfactory"/>         <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/>         <filter class="solr.lowercasefilterfactory"/>     </analyzer> </fieldtype>  <fieldtype class="solr.textfield" name="text_auto">     <analyzer type="index">         <charfilter class="solr.htmlstripcharfilterfactory"/>         <tokenizer class="solr.standardtokenizerfactory"/>         <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="false"/>         <filter class="solr.lowercasefilterfactory"/>         <filter class="solr.removeduplicatestokenfilterfactory"/>         <filter class="solr.shinglefilterfactory" maxshinglesize="3" outputunigrams="true" outputunigramsifnoshingles="false"/>     </analyzer>     <analyzer type="query">         <filter class="solr.removeduplicatestokenfilterfactory"/>         <tokenizer class="solr.standardtokenizerfactory"/>         <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="false"/>     </analyzer> </fieldtype>  <field name="deal_title_terms" type="text_auto" indexed="true" stored="false" required="false" multivalued="true"/>  <field name="deal_description" type="text_general" indexed="true" stored="true" required="false" multivalued="false"/> 

in stopwords.txt have next words: the, is, a;
have next data in fields:

deal_description - description
deal_title_terms - deal title terms (will splitted in terms)

when try search deal_description:
example 1: "deal_description: his m" - expect document deal_description "this description" returned
example 2: "deal_description: is th" - expect nothing found because "is" , "the" stopwords.

when try search deal_title_terms:
example 1: "deal_title_terms: is" - expect nothing found because "is" stopword.
example 2: "deal_title_terms: is deal" - expect "is" , "the" ignored , term "deal" found.
example 3: "deal_title_terms: title terms" - expect "a" ignored , term "title terms" found.

question 1: why stopwords don't works "deal_description" field ?
question 2: why field "deal_title_terms" stopwords not removed query ?(when trying find title terms not find "title terms" term)
question 3: there way show stopwords in search result prevent them searching ? example:

data: cool search engine
search query : "is coo" -> return "this cool search engine"
search query : "is" -> return nothing
search query : "this coll" -> return "this cool search engine"

question 4: where can find detailed description (maybe examples) how stopwords works in solr ? because looks magic.

answer question 1 : replace "keywordtokenizerfactory" no actual tokenizing, entire input string preserved single token.use standardtokenizerfactory instead.

or use below fieldtype.

<fieldtype name="text_general" class="solr.textfield" positionincrementgap="100"> <analyzer type="index"> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/> <filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> </fieldtype> 

stopwords work expected "deal_description" field.

answer question 3 : yes. add stopfilterfactory in analyzer of type="query" only. prevent them searching , not adding them while indexing.

answer quesion 4 : https://wiki.apache.org/solr/analyzerstokenizerstokenfilters

answer quesion 2 : custom field created seems incorrect. text has tokenised first using tokenizers using filters first. check analysis of solr analysis page.


Comments

Popular posts from this blog

How to group boxplot outliers in gnuplot -

cakephp - simple blog with croogo -

bash - Performing variable substitution in a string -