aggregate a field in elasticsearch-dsl using python -
can tell me how write python statements aggregate (sum , count) stuff documents?
script
from datetime import datetime elasticsearch_dsl import doctype, string, date, integer elasticsearch_dsl.connections import connections elasticsearch import elasticsearch elasticsearch_dsl import search, q # define default elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = search(using=client, index="attendance") s = s.execute() tag in s.aggregations.per_tag.buckets: print (tag.key)
output
file "/library/python/2.7/site-packages/elasticsearch_dsl/utils.py", line 106, in __getattr__ '%r object has no attribute %r' % (self.__class__.__name__, attr_name)) attributeerror: 'response' object has no attribute 'aggregations'
what causing this? "aggregations" keyword wrong? there other package need import? if document in "attendance" index has field called emailaddress, how count documents have value field?
first of all. notice wrote here, has no aggregations defined. documentation on how use not readable me. using wrote above, i'll expand. i'm changing index name make nicer example.
from datetime import datetime elasticsearch_dsl import doctype, string, date, integer elasticsearch_dsl.connections import connections elasticsearch import elasticsearch elasticsearch_dsl import search, q # define default elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = search(using=client, index="airbnb", doc_type="sleep_overs") s = s.execute() # invalid! haven't defined aggregation. #for tag in s.aggregations.per_tag.buckets: # print (tag.key) # lets make aggregation # 'by_house' name choose, 'terms' keyword type of aggregator # 'field' keyword, , 'house_number' field in our es index s.aggs.bucket('by_house', 'terms', field='house_number', size=0)
above we're creating 1 bucket per house number. therefore, name of bucket house number. elasticsearch (es) give document count of documents fitting bucket. size=0 means give use results, since es has default setting return 10 results (or whatever dev set do).
# runs query. s = s.execute() # let's see what's in our results print s.aggregations.by_house.doc_count print s.hits.total print s.aggregations.by_house.buckets item in s.aggregations.by_house.buckets: print item.doc_count
my mistake before thinking elastic search query had aggregations default. sort of define them yourself, execute them. response can split b aggregators mentioned.
the curl above should like:
note: use sense elasticsearch plugin/extension/add-on google chrome. in sense can use // comment things out.
post /airbnb/sleep_overs/_search { // size 0 here means not return hits, aggregation part of result "size": 0, "aggs": { "by_house": { "terms": { // size 0 here means return results, not the default 10 results "field": "house_number", "size": 0 } } } }
work-around. on git of dsl told me forget translating, , use method. it's simpler, , can write tough stuff in curl. that's why call work-around.
# define default elasticsearch client client = connections.create_connection(hosts=['http://blahblahblah:9200']) s = search(using=client, index="airbnb", doc_type="sleep_overs") # how simple past curl code here body = { "size": 0, "aggs": { "by_house": { "terms": { "field": "house_number", "size": 0 } } } } s = search.from_dict(body) s = s.index("airbnb") s = s.doc_type("sleepovers") body = s.to_dict() t = s.execute() item in t.aggregations.by_house.buckets: # item.key house number print item.key, item.doc_count
hope helps. design in curl, use python statement peel away @ results want. helps aggregations multiple levels (sub-aggregations).
Comments
Post a Comment