大文档对搜索结果影响是什么？看了官方文档有点疑问

Elasticsearch | 作者 hapjin | 发布于2019年12月19日 | 阅读数：1836

看general-recommendations 里面提到，由于http.max_content_length参数的限制，ES拒绝索引超过100M的文档。
其中里面还有一句话提到：

Large documents put more stress on network, memory usage and disk, even for search requests that do not request the _source since Elasticsearch needs to fetch the _id of the document in all cases, and the cost of getting this field is bigger for large documents due to how the filesystem cache works.

SEARCH、GET、UPDATE 等这样的操作都需要获取文档_id（needs to fetch the _id of the document in all cases）,尽管不需要获取_source，但是由于filesystem cache原因，获取_id的代价也很大。
我不明白的地方是：到底是因为filesystem cache的什么原因，导致这个代价很大呢？

是因为：尽管只是fetch _id，但是操作系统也会把 该文档的其他字段内容 也加载到内存里面缓存起来吗？那么当文档很大时，就需要花费很多内存保存这样的内容，导致filesystem cache的真正被应用程序使用的利用率降低了？

参考：

The filesystem cache will be used in order to buffer I/O operations. You should make sure to give at least half the memory of the machine running Elasticsearch to the filesystem cache.

Lucene index file formats

0 个回复

要回复问题请先登录或注册

大文档对搜索结果影响是什么？看了官方文档有点疑问

0 个回复

发起人

相关问题

问题状态

大文档对搜索结果影响是什么？看了官方文档有点疑问

与内容相关的链接

0 个回复

发起人

相关问题

问题状态