search - Solr stopwords magic -
my stopwords don't works expected. here part of schema:
<fieldtype name="text_general" class="solr.textfield"> <analyzer type="index"> <tokenizer class="solr.keywordtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.keywordtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> </fieldtype> <fieldtype class="solr.textfield" name="text_auto"> <analyzer type="index"> <charfilter class="solr.htmlstripcharfilterfactory"/> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="false"/> <filter class="solr.lowercasefilterfactory"/> <filter class="solr.removeduplicatestokenfilterfactory"/> <filter class="solr.shinglefilterfactory" maxshinglesize="3" outputunigrams="true" outputunigramsifnoshingles="false"/> </analyzer> <analyzer type="query"> <filter class="solr.removeduplicatestokenfilterfactory"/> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="false"/> </analyzer> </fieldtype> <field name="deal_title_terms" type="text_auto" indexed="true" stored="false" required="false" multivalued="true"/> <field name="deal_description" type="text_general" indexed="true" stored="true" required="false" multivalued="false"/> in stopwords.txt have next words: the, is, a;
have next data in fields:
deal_description - description
deal_title_terms - deal title terms (will splitted in terms)
when try search deal_description:
example 1: "deal_description: his m" - expect document deal_description "this description" returned
example 2: "deal_description: is th" - expect nothing found because "is" , "the" stopwords.
when try search deal_title_terms:
example 1: "deal_title_terms: is" - expect nothing found because "is" stopword.
example 2: "deal_title_terms: is deal" - expect "is" , "the" ignored , term "deal" found.
example 3: "deal_title_terms: title terms" - expect "a" ignored , term "title terms" found.
question 1: why stopwords don't works "deal_description" field ?
question 2: why field "deal_title_terms" stopwords not removed query ?(when trying find title terms not find "title terms" term)
question 3: there way show stopwords in search result prevent them searching ? example:
data: cool search engine
search query : "is coo" -> return "this cool search engine"
search query : "is" -> return nothing
search query : "this coll" -> return "this cool search engine"
question 4: where can find detailed description (maybe examples) how stopwords works in solr ? because looks magic.
answer question 1 : replace "keywordtokenizerfactory" no actual tokenizing, entire input string preserved single token.use standardtokenizerfactory instead.
or use below fieldtype.
<fieldtype name="text_general" class="solr.textfield" positionincrementgap="100"> <analyzer type="index"> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.standardtokenizerfactory"/> <filter class="solr.stopfilterfactory" ignorecase="true" words="stopwords.txt" enablepositionincrements="true"/> <filter class="solr.synonymfilterfactory" synonyms="synonyms.txt" ignorecase="true" expand="true"/> <filter class="solr.lowercasefilterfactory"/> </analyzer> </fieldtype> stopwords work expected "deal_description" field.
answer question 3 : yes. add stopfilterfactory in analyzer of type="query" only. prevent them searching , not adding them while indexing.
answer quesion 4 : https://wiki.apache.org/solr/analyzerstokenizerstokenfilters
answer quesion 2 : custom field created seems incorrect. text has tokenised first using tokenizers using filters first. check analysis of solr analysis page.
Comments
Post a Comment