Pandasticsearch: An Elasticsearch client exposing DataFrame API
Elasticsearch | 作者 onesuper | 发布于2016年11月08日 | | 阅读数:6179
https://github.com/onesuper/pandasticsearch
# Create a DataFrame object
from pandasticsearch import DataFrame
df = DataFrame.from_es('http://localhost:9200', index='people')
# Print the schema(mapping) of the index
df.print_schema()
# company
# |-- employee
# |-- name: {'index': 'not_analyzed', 'type': 'string'}
# |-- age: {'type': 'integer'}
# |-- gender: {'index': 'not_analyzed', 'type': 'string'}
# Inspect the columns
df.columns
#['name', 'age', 'gender']
# Get the column
df.name
# Column('name')
# Filter
df.filter(df.age < 13).collect()
# [Row(age=12,gender='female',name='Alice'), Row(age=11,gender='male',name='Bob')]
# Project
df.filter(df.age < 25).select('name', 'age').collect()
# [Row(age=12,name='Alice'), Row(age=11,name='Bob'), Row(age=13,name='Leo')]
# Print the rows into console
df.filter(df.age < 25).select('name').show(3)
# +------+
# | name |
# +------+
# | Alice|
# | Bob |
# | Leo |
# +------+
# Sort
df.sort(df.age.asc).select('name', 'age').collect()
#[Row(age=11,name='Bob'), Row(age=12,name='Alice'), Row(age=13,name='Leo')]
# Aggregate
df[df.gender == 'male'].agg(df.age.avg).collect()
# [Row(avg(age)=12)]
# Groupby
df.groupby('gender').collect()
# [Row(doc_count=1), Row(doc_count=2)]
# Groupby and then aggregate
df.groupby('gender').agg(df.age.max).collect()
# [Row(doc_count=1, max(age)=12), Row(doc_count=2, max(age)=13)]
# Convert to Pandas object for subsequent analysis
df[df.gender == 'male'].agg(df.age.avg).to_pandas()
# avg(age)
# 0 12
[尊重社区原创,转载请保留或注明出处]
本文地址:http://searchkit.cn/article/108
本文地址:http://searchkit.cn/article/108