search - Solr stopwords magic -


my stopwords don't works expected. here part of schema:

<fieldtype name="text_general" class="solr.textfield">     <analyzer type="index">         <tokenizer class="solr.keywordtokenizerfactory"/>         <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/>         <filter class="solr.lowercasefilterfactory"/>     </analyzer>     <analyzer type="query">         <tokenizer class="solr.keywordtokenizerfactory"/>         <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/>         <filter class="solr.lowercasefilterfactory"/>     </analyzer> </fieldtype>  <fieldtype class="solr.textfield" name="text_auto">     <analyzer type="index">         <charfilter class="solr.htmlstripcharfilterfactory"/>         <tokenizer class="solr.standardtokenizerfactory"/>         <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="false"/>         <filter class="solr.lowercasefilterfactory"/>         <filter class="solr.removeduplicatestokenfilterfactory"/>         <filter class="solr.shinglefilterfactory" maxshinglesize="3" outputunigrams="true" outputunigramsifnoshingles="false"/>     </analyzer>     <analyzer type="query">         <filter class="solr.removeduplicatestokenfilterfactory"/>         <tokenizer class="solr.standardtokenizerfactory"/>         <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="false"/>     </analyzer> </fieldtype>  <field name="deal_title_terms" type="text_auto" indexed="true" stored="false" required="false" multivalued="true"/>  <field name="deal_description" type="text_general" indexed="true" stored="true" required="false" multivalued="false"/> 

in stopwords.txt have next words: the, is, a;
have next data in fields:

deal_description - description
deal_title_terms - deal title terms (will splitted in terms)

when try search deal_description:
example 1: "deal_description: his m" - expect document deal_description "this description" returned
example 2: "deal_description: is th" - expect nothing found because "is" , "the" stopwords.

when try search deal_title_terms:
example 1: "deal_title_terms: is" - expect nothing found because "is" stopword.
example 2: "deal_title_terms: is deal" - expect "is" , "the" ignored , term "deal" found.
example 3: "deal_title_terms: title terms" - expect "a" ignored , term "title terms" found.

question 1: why stopwords don't works "deal_description" field ?
question 2: why field "deal_title_terms" stopwords not removed query ?(when trying find title terms not find "title terms" term)
question 3: there way show stopwords in search result prevent them searching ? example:

data: cool search engine
search query : "is coo" -> return "this cool search engine"
search query : "is" -> return nothing
search query : "this coll" -> return "this cool search engine"

question 4: where can find detailed description (maybe examples) how stopwords works in solr ? because looks magic.

answer question 1 : replace "keywordtokenizerfactory" no actual tokenizing, entire input string preserved single token.use standardtokenizerfactory instead.

or use below fieldtype.

<fieldtype name="text_general" class="solr.textfield" positionincrementgap="100"> <analyzer type="index"> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/> <filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> </fieldtype> 

stopwords work expected "deal_description" field.

answer question 3 : yes. add stopfilterfactory in analyzer of type="query" only. prevent them searching , not adding them while indexing.

answer quesion 4 : https://wiki.apache.org/solr/analyzerstokenizerstokenfilters

answer quesion 2 : custom field created seems incorrect. text has tokenised first using tokenizers using filters first. check analysis of solr analysis page.


Comments

Popular posts from this blog

cakephp - simple blog with croogo -

How to group boxplot outliers in gnuplot -

bash - Performing variable substitution in a string -