橡皮、老虎皮、狮子皮哪一个最不好?

fielddata的产生

Elasticsearch | 作者 zhangg7723 | 发布于2018年07月27日 | 阅读数:2425

请教各位,索引里有一个String类型字段,属性如下:

Snipaste_2018-07-27_15-02-49.png

当按这个字段执行去重或分组查询时(distinct或group by),就会出现fielddata缓存。not_analyzed字段的聚合计算怎么会产生fielddata呢?
ES版本是2.3。
已邀请:

rochy - rochy_he

赞同来自: kennywu76

在 not_analyzed String字段上使用Doc Values时,仍需要从 全局序数(global_ordinals) 中获取一些字段数据。 
 
全局序数(global_ordinals)是一个数据结构,它为该字段的索引中的每个 Term 分配一个数字(序号),以达到节约内存的目的。 
 
全局序数(global_ordinals)不能包含在Doc Values中,因为它们需要在查询时通过运行当前在字段中指定每个唯一编号的所有 Term 来计算。 
 
因此,即使在对not_analyzed String字段使用Doc Values时,仍然会看到少量的字段数据使用情况。

下面是原文:
 
When using Doc Values on a not_analyzed String field you may still get some field data usage from global ordinals . This is a data structure that assigns a number (ordinal) to each term in the index for that field to save using excess memory by having multiple copies of the String value of the field when doing calculations. Global ordinals cannot be included in Doc Values as they need to be computed at query time by running over all the terms currently in the field assigning each a unique number. This would explain why you still see a small amount of field data usage even when you are using doc values for a not_analyzed String field.


 

要回复问题请先登录注册