从es2.3到5.6的迁移实践
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/e ... .html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: es5_dev
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
node.name: es5-node03
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: ["127.0.0.1","10.204.12.33"]
http.port: 9201
transport.tcp.port: 9301
#http.host: 127.0.0.1
#http.enabled: false
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts:
- 10.204.12.31:9301
- 10.204.12.32:9301
- 10.204.12.33:9301
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
indices.requests.cache.size: 5%
config/jvm.options
## JVM configuration
################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/e ... .html
## for more information
##
################################################################
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms2g
-Xmx2g
################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################
## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
## optimizations
# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch
## basic
# force the server VM (remove on 32-bit client JVMs)
-server
# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m
# set to headless, just in case
-Djava.awt.headless=true
# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8
# use our provided JNA always versus the system one
-Djna.nosys=true
# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true
# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}
## GC logging
#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}
# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M
# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true
安装ik分词器
bin/elasticsearch-plugin install https://github.com/medcl/elast ... 1.zip
./bin/elasticsearch-plugin install https://github.com/medcl/elast ... 3.zip
配置ik远程扩展词典用于热词更新 elasticsearch-5.6.3/config/analysis-ik/IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">http://distribute.search.leju. ... gt%3B
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
安装拼音分词器
cd elasticsearch-5.5.1/plugins
wget https://github.com/medcl/elast ... 5.5.1
unzip v5.5.1
打包部署其他节点时,先清理data目录
集群监控可以利用head的chrome插件
数据迁移
迁移工具是自己写的elasticbak,目前更新了5.6.3驱动。github链接:https://github.com/jiashiwen/elasticbak。
数据备份
java -jar elasticbak-2.3.3.jar \
--exp \
--cluster lejuesdev \
--host 10.204.12.31 \
--filesize 1000 \
--backupdir ./esbackupset \
--backupindexes "*" \
--threads 4
由于版本field的变化需要手工重建索引,这里举个例子,主要是2.x版本的string需要改为text。2.x版本我们通过index参数指定该字段是否被索引("index": "no")以及是否通过分词器分词("index": "not_analyzed")。在5.X版本里index只用来制定是否创建索引,如果需要整个字段不过分词器创建索引,需要通过keyword字段完成。
curl -XPUT "http://10.204.12.31:9201/house_geo" -H 'Content-Type: application/json' -d'
{
"mappings": {
"house": {
"dynamic": "strict",
"_all": {
"enabled": false
},
"properties": {
"_category": {
"type": "keyword",
"store": true
},
"_content": {
"type": "text",
"store": true,
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_deleted": {
"type": "boolean",
"store": true
},
"_doccreatetime": {
"type": "date",
"store": true,
"format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
},
"_docupdatetime": {
"type": "date",
"store": true,
"format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
},
"_flags": {
"type": "text",
"store": true,
"analyzer": "whitespace"
},
"_hits": {
"type": "text"
},
"_location": {
"type": "geo_point"
},
"_multi": {
"properties": {
"_location": {
"type": "geo_point"
}
}
},
"_origin": {
"type": "object",
"enabled": false
},
"_scope": {
"type": "keyword",
"store": true
},
"_tags": {
"type": "text",
"boost": 10,
"store": true,
"term_vector": "with_positions_offsets",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_title": {
"type": "text",
"store": true,
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_uniqid": {
"type": "keyword",
"store": true
},
"_uniqsign": {
"type": "keyword",
"store": true
},
"_url": {
"type": "text",
"index": false,
"store": true
},
"location": {
"type": "geo_point"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "3",
"requests": {
"cache": {
"enable": "true"
}
},
"analysis": {
"filter": {
"my_synonym": {
"type": "synonym",
"synonyms_path": "analysis-ik/custom/synonym.dic"
}
},
"analyzer": {
"searchanalyzer": {
"filter": "my_synonym",
"type": "custom",
"tokenizer": "ik_smart"
},
"indexanalyzer": {
"filter": "my_synonym",
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "1"
}
}
}'
利用新版elasticbak导入索引数据
java -jar elasticbak-5.6.3.jar \
--imp \
--cluster es5_dev \
--host 10.204.12.31 \
--port 9301 \
--restoreindex house_geo \
--restoretype dataonly \
--backupset esbackupset/house_geo \
--threads 4
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/e ... .html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: es5_dev
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
node.name: es5-node03
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: ["127.0.0.1","10.204.12.33"]
http.port: 9201
transport.tcp.port: 9301
#http.host: 127.0.0.1
#http.enabled: false
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts:
- 10.204.12.31:9301
- 10.204.12.32:9301
- 10.204.12.33:9301
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
indices.requests.cache.size: 5%
config/jvm.options
## JVM configuration
################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/e ... .html
## for more information
##
################################################################
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms2g
-Xmx2g
################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################
## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
## optimizations
# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch
## basic
# force the server VM (remove on 32-bit client JVMs)
-server
# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m
# set to headless, just in case
-Djava.awt.headless=true
# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8
# use our provided JNA always versus the system one
-Djna.nosys=true
# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true
# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}
## GC logging
#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}
# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M
# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true
安装ik分词器
bin/elasticsearch-plugin install https://github.com/medcl/elast ... 1.zip
./bin/elasticsearch-plugin install https://github.com/medcl/elast ... 3.zip
配置ik远程扩展词典用于热词更新 elasticsearch-5.6.3/config/analysis-ik/IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">http://distribute.search.leju. ... gt%3B
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
安装拼音分词器
cd elasticsearch-5.5.1/plugins
wget https://github.com/medcl/elast ... 5.5.1
unzip v5.5.1
打包部署其他节点时,先清理data目录
集群监控可以利用head的chrome插件
数据迁移
迁移工具是自己写的elasticbak,目前更新了5.6.3驱动。github链接:https://github.com/jiashiwen/elasticbak。
数据备份
java -jar elasticbak-2.3.3.jar \
--exp \
--cluster lejuesdev \
--host 10.204.12.31 \
--filesize 1000 \
--backupdir ./esbackupset \
--backupindexes "*" \
--threads 4
由于版本field的变化需要手工重建索引,这里举个例子,主要是2.x版本的string需要改为text。2.x版本我们通过index参数指定该字段是否被索引("index": "no")以及是否通过分词器分词("index": "not_analyzed")。在5.X版本里index只用来制定是否创建索引,如果需要整个字段不过分词器创建索引,需要通过keyword字段完成。
curl -XPUT "http://10.204.12.31:9201/house_geo" -H 'Content-Type: application/json' -d'
{
"mappings": {
"house": {
"dynamic": "strict",
"_all": {
"enabled": false
},
"properties": {
"_category": {
"type": "keyword",
"store": true
},
"_content": {
"type": "text",
"store": true,
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_deleted": {
"type": "boolean",
"store": true
},
"_doccreatetime": {
"type": "date",
"store": true,
"format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
},
"_docupdatetime": {
"type": "date",
"store": true,
"format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
},
"_flags": {
"type": "text",
"store": true,
"analyzer": "whitespace"
},
"_hits": {
"type": "text"
},
"_location": {
"type": "geo_point"
},
"_multi": {
"properties": {
"_location": {
"type": "geo_point"
}
}
},
"_origin": {
"type": "object",
"enabled": false
},
"_scope": {
"type": "keyword",
"store": true
},
"_tags": {
"type": "text",
"boost": 10,
"store": true,
"term_vector": "with_positions_offsets",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_title": {
"type": "text",
"store": true,
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_uniqid": {
"type": "keyword",
"store": true
},
"_uniqsign": {
"type": "keyword",
"store": true
},
"_url": {
"type": "text",
"index": false,
"store": true
},
"location": {
"type": "geo_point"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "3",
"requests": {
"cache": {
"enable": "true"
}
},
"analysis": {
"filter": {
"my_synonym": {
"type": "synonym",
"synonyms_path": "analysis-ik/custom/synonym.dic"
}
},
"analyzer": {
"searchanalyzer": {
"filter": "my_synonym",
"type": "custom",
"tokenizer": "ik_smart"
},
"indexanalyzer": {
"filter": "my_synonym",
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "1"
}
}
}'
利用新版elasticbak导入索引数据
java -jar elasticbak-5.6.3.jar \
--imp \
--cluster es5_dev \
--host 10.204.12.31 \
--port 9301 \
--restoreindex house_geo \
--restoretype dataonly \
--backupset esbackupset/house_geo \
--threads 4
收起阅读 »
社区日报 第99期 (2017-11-13)
http://t.cn/Rj2uLh9
2、logstash配置文件的vscode插件,从其编辑配置文件不再发愁。
http://t.cn/Rj21ncE
3、elk告警插件sentinl。随着版本的更新,目前已经可以媲美x-pack的reporter以及watcher。
http://t.cn/Rj216Ef
4、只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:cyberdak
归档:https://elasticsearch.cn/article/372
订阅:https://tinyletter.com/elastic-daily
http://t.cn/Rj2uLh9
2、logstash配置文件的vscode插件,从其编辑配置文件不再发愁。
http://t.cn/Rj21ncE
3、elk告警插件sentinl。随着版本的更新,目前已经可以媲美x-pack的reporter以及watcher。
http://t.cn/Rj216Ef
4、只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:cyberdak
归档:https://elasticsearch.cn/article/372
订阅:https://tinyletter.com/elastic-daily 收起阅读 »
社区日报 第98期 (2017-11-12)
http://t.cn/RjPvlq1
2. 将 ELASTICSEARCH 写入速度优化到极限
http://t.cn/RWs8yvS
3. 零点之战!探访阿里巴巴8大技术专家,提前揭秘2017双11关键技术。
http://t.cn/RjPPzGc
4. 只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:至尊宝
归档:https://elasticsearch.cn/article/371
订阅:https://tinyletter.com/elastic-daily
http://t.cn/RjPvlq1
2. 将 ELASTICSEARCH 写入速度优化到极限
http://t.cn/RWs8yvS
3. 零点之战!探访阿里巴巴8大技术专家,提前揭秘2017双11关键技术。
http://t.cn/RjPPzGc
4. 只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:至尊宝
归档:https://elasticsearch.cn/article/371
订阅:https://tinyletter.com/elastic-daily 收起阅读 »
三步上手 esrally 完成 elasticsearch 压测任务
距离上一篇 esrally 教程过去快2个月了,这期间不停有同学来询问使用中遇到的问题,尤其由于其测试数据存储在国外 aws 上,导致下载极慢。为了让大家快速上手使用 esrally,我 build 了一个可用的 docker 镜像,然后将 13GB
的测试数据拉取到国内的存储上,通过百度网盘的方式分享给大家。大家只要按照下面简单的几步操作就可以顺畅地使用 esrally 来进行相关测试了。
操作步骤
废话不多说,先上菜!
- 拉取镜像
docker pull rockybean/esrally
- 下载数据文件 链接:http://pan.baidu.com/s/1eSrjZgA 密码:aagl
- 进入下载后的文件夹 rally_track,执行如下命令开始测试
docker run -it -v $(PWD):/root/track rockybean/esrally esrally race --track-path=/root/track/logging --offline --pipeline=benchmark-only --target-hosts=192.168.1.105:9200
打完收工!
几点说明
数据文件介绍
esrally 自带的测试数据即为 rally_track 文件夹中的内容,主要包括:
- Geonames(geonames): for evaluating the performance of structured data.
- Geopoint(geopoint): for evaluating the performance of geo queries.
- Percolator(percolator): for evaluating the performance of percolation queries.
- PMC(pmc): for evaluating the performance of full text search.
- NYC taxis(nyc_taxis): for evaluating the performance for highly structured data.
- Nested(nested): for evaluating the performance for nested documents.
- Logging(logging): for evaluating the performance of (Web) server logs.
- noaa(noaa): for evaluating the performance of range fields.
可以根据自己的需要下载对应的测试数据,不必下载全部,保证对应文件夹下载完全即可。
命令解释
docker 相关
docker run -it rockybean/esrally esrally
为执行的 esrally 命令,-v $(PWD):/root/track
是将 rally_docker 文件夹映射到 docker 容器中,$(PWD)
是获取当前目录的意思,所以在此前要 cd 到 rally_docker 目录,当然你写全路径也是没有问题的。
esrally 的 docker 镜像比较简单,可以参看 github 项目介绍。
esrally 相关
该镜像是通过自定义 track 的方式来加载数据,所以命令行中用到 --track=/root/track/logging
的命令行参数。注意这里的 /root/track
即上面我们绑定到容器的目录,更换 logging
为其他的数据集名称即可加载其他的测试数据。
该容器只支持测试第三方 es 集群,即 --pipeline=benchmark-only
模式。这应该也是最常见的压测需求了。
愉快地去玩耍吧!
距离上一篇 esrally 教程过去快2个月了,这期间不停有同学来询问使用中遇到的问题,尤其由于其测试数据存储在国外 aws 上,导致下载极慢。为了让大家快速上手使用 esrally,我 build 了一个可用的 docker 镜像,然后将 13GB
的测试数据拉取到国内的存储上,通过百度网盘的方式分享给大家。大家只要按照下面简单的几步操作就可以顺畅地使用 esrally 来进行相关测试了。
操作步骤
废话不多说,先上菜!
- 拉取镜像
docker pull rockybean/esrally
- 下载数据文件 链接:http://pan.baidu.com/s/1eSrjZgA 密码:aagl
- 进入下载后的文件夹 rally_track,执行如下命令开始测试
docker run -it -v $(PWD):/root/track rockybean/esrally esrally race --track-path=/root/track/logging --offline --pipeline=benchmark-only --target-hosts=192.168.1.105:9200
打完收工!
几点说明
数据文件介绍
esrally 自带的测试数据即为 rally_track 文件夹中的内容,主要包括:
- Geonames(geonames): for evaluating the performance of structured data.
- Geopoint(geopoint): for evaluating the performance of geo queries.
- Percolator(percolator): for evaluating the performance of percolation queries.
- PMC(pmc): for evaluating the performance of full text search.
- NYC taxis(nyc_taxis): for evaluating the performance for highly structured data.
- Nested(nested): for evaluating the performance for nested documents.
- Logging(logging): for evaluating the performance of (Web) server logs.
- noaa(noaa): for evaluating the performance of range fields.
可以根据自己的需要下载对应的测试数据,不必下载全部,保证对应文件夹下载完全即可。
命令解释
docker 相关
docker run -it rockybean/esrally esrally
为执行的 esrally 命令,-v $(PWD):/root/track
是将 rally_docker 文件夹映射到 docker 容器中,$(PWD)
是获取当前目录的意思,所以在此前要 cd 到 rally_docker 目录,当然你写全路径也是没有问题的。
esrally 的 docker 镜像比较简单,可以参看 github 项目介绍。
esrally 相关
该镜像是通过自定义 track 的方式来加载数据,所以命令行中用到 --track=/root/track/logging
的命令行参数。注意这里的 /root/track
即上面我们绑定到容器的目录,更换 logging
为其他的数据集名称即可加载其他的测试数据。
该容器只支持测试第三方 es 集群,即 --pipeline=benchmark-only
模式。这应该也是最常见的压测需求了。
愉快地去玩耍吧!
收起阅读 »sense不能用了改用kibana吧
一、elasticsearch5.5.2+kibana5.5.2
1.下载与elasticsearch版本号一致的kibana安装包,笔者目前开发环境5.5.2,对应kibana版本也为5.5.2(最新的5.6版本会报不兼容错误,不能运行)。
2.配置config/kibana.yml文件,主要配置项如下
# The URL of the Elasticsearch instance to use for all your queries.
#elasticsearch.url: "http://localhost:9200"
elasticsearch.url: "https://192.168.1.1:9281/"
# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "user"
#elasticsearch.password: "pass"
elasticsearch.username: "admin"
elasticsearch.password: "admin"
# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key
elasticsearch.ssl.certificate: /home/develop/kibana-5.6.3-linux-x86_64/config/crts/eshttp.crt
elasticsearch.ssl.key: /home/develop/kibana-5.6.3-linux-x86_64/config/crts/eshttp.key
# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full
elasticsearch.ssl.verificationMode: none各项配置看文件内说明,写的很清楚,这里就不翻译了,其中最重要的是这两样elasticsearch.ssl.certificate和elasticsearch.ssl.key,一定要与服务端保持一致。由于证书是自己生成的,校验项elasticsearch.ssl.verificationMode的值需要改为none。
启动kibana后,通过http://localhose:5601访问即可
一、elasticsearch5.5.2+kibana5.5.2
1.下载与elasticsearch版本号一致的kibana安装包,笔者目前开发环境5.5.2,对应kibana版本也为5.5.2(最新的5.6版本会报不兼容错误,不能运行)。
2.配置config/kibana.yml文件,主要配置项如下
# The URL of the Elasticsearch instance to use for all your queries.
#elasticsearch.url: "http://localhost:9200"
elasticsearch.url: "https://192.168.1.1:9281/"
# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "user"
#elasticsearch.password: "pass"
elasticsearch.username: "admin"
elasticsearch.password: "admin"
# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key
elasticsearch.ssl.certificate: /home/develop/kibana-5.6.3-linux-x86_64/config/crts/eshttp.crt
elasticsearch.ssl.key: /home/develop/kibana-5.6.3-linux-x86_64/config/crts/eshttp.key
# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full
elasticsearch.ssl.verificationMode: none各项配置看文件内说明,写的很清楚,这里就不翻译了,其中最重要的是这两样elasticsearch.ssl.certificate和elasticsearch.ssl.key,一定要与服务端保持一致。由于证书是自己生成的,校验项elasticsearch.ssl.verificationMode的值需要改为none。
启动kibana后,通过http://localhose:5601访问即可 收起阅读 »
社区日报 第97期 (2017-11-11)
-
sense为什么不能用了,看看ES官方怎么说? http://t.cn/RlB3B62
-
使用allocation API快速定位分片分配问题 http://t.cn/RlrzTsD
-
ES6.0有关防止硬盘被填满的改进 http://t.cn/RlrU3Nr
-
喜大普奔,ES社区支持Markdown编辑器了 https://elasticsearch.cn/article/366
-
Elastic 收购网站搜索 SaaS 服务领导者 Swiftype http://t.cn/Rl3a4P2
- 只等你来 | Elastic Meetup 广州交流会 https://elasticsearch.cn/article/364
-
sense为什么不能用了,看看ES官方怎么说? http://t.cn/RlB3B62
-
使用allocation API快速定位分片分配问题 http://t.cn/RlrzTsD
-
ES6.0有关防止硬盘被填满的改进 http://t.cn/RlrU3Nr
-
喜大普奔,ES社区支持Markdown编辑器了 https://elasticsearch.cn/article/366
-
Elastic 收购网站搜索 SaaS 服务领导者 Swiftype http://t.cn/Rl3a4P2
- 只等你来 | Elastic Meetup 广州交流会 https://elasticsearch.cn/article/364
Elastic 招聘技术支持工程师,坐标北京
Support Engineer - Mandarin Speaking
Location: Beijing, China
Department: Support
Responsibilities
- Ensuring customer issues are resolved within our committed service level agreements.
- Maintain strong relationships with our customers for the delivery of support.
- Have a mindset of continuous improvement, in terms of efficiency of support processes and customer satisfaction.
Experience
- Demonstrable experience in of support in technology businesses
- Experience working across multi-cultural and geographically distributed teams
Key Skills
- Strong verbal and written communication skills in both Mandarin and English.
- Customer orientated focus.
- Team player, ability to work in a fast pace environment with a positive and adaptable approach.
- Knowledge of databases or search technologies a plus.
- Demonstrated strong technical understanding of software products.
Additional Information
- Competitive pay and benefits
- Stock options
- Catered lunches, snacks, and beverages in most offices
- An environment in which you can balance great work with a great life
- Passionate people building great products
- Employees with a wide variety of interests
- Distributed-first company with employees in over 30 countries, spread across 18 time zones, and speaking over 30 languages!
Elastic is an Equal Employment employer committed to the principles of equal employment opportunity and affirmative action for all applicants and employees. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status or any other basis protected by federal, state or local law, ordinance or regulation. Elastic also makes reasonable accommodations for disabled employees consistent with applicable law.
About Elastic
Elastic is the world's leading software provider for making structured and unstructured data usable in real time for search, logging, security, and analytics use cases. Founded in 2012 by the people behind the Elasticsearch, Kibana, Beats, and Logstash open source projects, Elastic's global community has more than 80,000 members across 45 countries, and since its initial release. Elastic's products have achieved more than 100 million cumulative downloads. Today thousands of organizations, including Cisco, eBay, Dell, Goldman Sachs, Groupon, HP, Microsoft, Netflix, The New York Times, Uber, Verizon, Yelp, and Wikipedia, use the Elastic Stack, X-Pack, and Elastic Cloud to power mission-critical systems that drive new revenue opportunities and massive cost savings. Elastic is backed by more than $104 million in funding from Benchmark Capital, Index Ventures, and NEA; has headquarters in Amsterdam, the Netherlands, and Mountain View, California; and has over 500 employees in more than 30 countries around the world.
Our Philosophy
We’re always on the search for amazing people, people who have deep passion for technology and are masters at their craft. We build highly sophisticated distributed systems and we don’t take our technology lightly. In Elasticsearch, you’ll have the opportunity to work in a vibrant young company next to some of the smartest and highly skilled technologists the industry has to offer. We’re looking for great team players, yet we also promote independence and ownership. We’re hackers… but of the good kind. The kind that innovates and creates cutting edge products that eventually translates to a lot of happy, smiling faces.
LifeAtElastic
Support Engineer - Mandarin Speaking
Location: Beijing, China
Department: Support
Responsibilities
- Ensuring customer issues are resolved within our committed service level agreements.
- Maintain strong relationships with our customers for the delivery of support.
- Have a mindset of continuous improvement, in terms of efficiency of support processes and customer satisfaction.
Experience
- Demonstrable experience in of support in technology businesses
- Experience working across multi-cultural and geographically distributed teams
Key Skills
- Strong verbal and written communication skills in both Mandarin and English.
- Customer orientated focus.
- Team player, ability to work in a fast pace environment with a positive and adaptable approach.
- Knowledge of databases or search technologies a plus.
- Demonstrated strong technical understanding of software products.
Additional Information
- Competitive pay and benefits
- Stock options
- Catered lunches, snacks, and beverages in most offices
- An environment in which you can balance great work with a great life
- Passionate people building great products
- Employees with a wide variety of interests
- Distributed-first company with employees in over 30 countries, spread across 18 time zones, and speaking over 30 languages!
Elastic is an Equal Employment employer committed to the principles of equal employment opportunity and affirmative action for all applicants and employees. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status or any other basis protected by federal, state or local law, ordinance or regulation. Elastic also makes reasonable accommodations for disabled employees consistent with applicable law.
About Elastic
Elastic is the world's leading software provider for making structured and unstructured data usable in real time for search, logging, security, and analytics use cases. Founded in 2012 by the people behind the Elasticsearch, Kibana, Beats, and Logstash open source projects, Elastic's global community has more than 80,000 members across 45 countries, and since its initial release. Elastic's products have achieved more than 100 million cumulative downloads. Today thousands of organizations, including Cisco, eBay, Dell, Goldman Sachs, Groupon, HP, Microsoft, Netflix, The New York Times, Uber, Verizon, Yelp, and Wikipedia, use the Elastic Stack, X-Pack, and Elastic Cloud to power mission-critical systems that drive new revenue opportunities and massive cost savings. Elastic is backed by more than $104 million in funding from Benchmark Capital, Index Ventures, and NEA; has headquarters in Amsterdam, the Netherlands, and Mountain View, California; and has over 500 employees in more than 30 countries around the world.
Our Philosophy
We’re always on the search for amazing people, people who have deep passion for technology and are masters at their craft. We build highly sophisticated distributed systems and we don’t take our technology lightly. In Elasticsearch, you’ll have the opportunity to work in a vibrant young company next to some of the smartest and highly skilled technologists the industry has to offer. We’re looking for great team players, yet we also promote independence and ownership. We’re hackers… but of the good kind. The kind that innovates and creates cutting edge products that eventually translates to a lot of happy, smiling faces.
LifeAtElastic
收起阅读 »社区支持 Markdown 编辑器
为了改善大家的创作体验,提高大家的写作和分享热情!?,经过两天的不懈奋斗,终于把 Markdown 编辑器搬上来了。 目前只支持文章的发布,可以通过切换编辑器来选择 Markdown 编辑模式。 希望不要再以编辑器作为理由发只有链接的文章了。 ???????????
- 支持 Github 风格的 Markdown 格式
- 支持本站附件功能
- 支持 emoj 符号
- 支持自动的页面导航
- 以前的文章可再次编辑,切换 Markdown 模式然后修改保存
如何使用?
- 点击【发起】,选择文章
- 切换绿色按钮,将编辑器切换到 Markdown,然后在文本框内输入 Markdown 格式的内容即可。
在线 Markdown 脚本编辑预览工具:https://elasticsearch.cn/static/js/editor/markdown/
以下为样式测试参考,忽略其意义。
----------- 常用格式-----------------
# 标题1
## 标题2
### 标题3
#### 标题4
##### 标题5
###### 标题6
超大标题 //等于号写于文字下方
===
标题 //同超大标题
---
`短代码`
_ 注:长代码块,用三个: ` _
> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
* Red
* Green
* Blue
- Red
- Green
- Blue
+ Red
+ Green
+ Blue
1. 这是第一个
1. 这是第二个
1. 这是第三个
* * *
***
*****
- - -
---
[markdown-syntax](http://daringfireball.net/projects/markdown/syntax)
[id]: http://example.com/ "Optional Title Here"
This is [an example][id] reference-style link.
*内容*
**内容**
_内容_
__内容__
![这是张外链图片](https://static-www.elastic.co/assets/bltbfcd44f1256d8c88/blog-swifttype-thumb.jpg?q=845)
<http://elastic.co/>
<info@elastic.o>
四个空格
一个tab
----------- 样式预览-----------------
标题1
标题2
标题3
标题4
标题5
标题6
超大标题 //等于号写于文字下方
标题 //同超大标题
短代码
This is the first level of quoting.
This is nested blockquote.
Back to the first level.
- Red
- Green
-
Blue
- Red
- Green
-
Blue
- Red
- Green
- Blue
- 这是第一个
- 这是第二个
- 这是第三个
This is an example reference-style link.
内容 内容 内容 内容
四个空格
一个tab
https://github.com/infinitbyte/gopa 的 README 内容
GOPA, A Spider Written in Go.
Goal
- Light weight, low footprint, memory requirement should < 100MB
- Easy to deploy, no runtime or dependency required
- Easy to use, no programming or scripts ability needed, out of box features
Screenshoot
How to use
Setup
First of all, get it, two opinions: download the pre-built package or compile it yourself.
Download Pre Built Package
Go to Release or Snapshot page, download the right package for your platform.
Note: Darwin is for Mac
Compile The Package Manually
- Mac/Linux: Run
make build
to build the Gopa. - Windows: Checkout this wiki page - How to build GOPA on windows.
So far, we have:
gopa
, the main program, a single binary.
config/
, elasticsearch related scripts etc.
gopa.yml
, main configuration for gopa.
Optional Config
By default, Gopa works well except indexing, if you want to use elasticsearch as indexing, follow these steps:
- Create a index in elasticsearch with script
config/gopa-index-mapping.sh
Example
curl -XPUT "http://localhost:9200/gopa-index" -H 'Content-Type: application/json' -d' { "mappings": { "doc": { "properties": { "host": { "type": "keyword", "ignore_above": 256 }, "snapshot": { "properties": { "bold": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 }, "content_type": { "type": "keyword", "ignore_above": 256 }, "file": { "type": "keyword", "ignore_above": 256 }, "h1": { "type": "text" }, "h2": { "type": "text" }, "h3": { "type": "text" }, "h4": { "type": "text" }, "hash": { "type": "keyword", "ignore_above": 256 }, "id": { "type": "keyword", "ignore_above": 256 }, "images": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "italic": { "type": "text" }, "links": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "path": { "type": "keyword", "ignore_above": 256 }, "sim_hash": { "type": "keyword", "ignore_above": 256 }, "lang": { "type": "keyword", "ignore_above": 256 }, "size": { "type": "long" }, "text": { "type": "text" }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "version": { "type": "long" } } }, "task": { "properties": { "breadth": { "type": "long" }, "created": { "type": "date" }, "depth": { "type": "long" }, "id": { "type": "keyword", "ignore_above": 256 }, "original_url": { "type": "keyword", "ignore_above": 256 }, "reference_url": { "type": "keyword", "ignore_above": 256 }, "schema": { "type": "keyword", "ignore_above": 256 }, "status": { "type": "integer" }, "updated": { "type": "date" }, "url": { "type": "keyword", "ignore_above": 256 } } } } } } }'
Note: Elasticsearch version should > v5.0
- Enable index module in
gopa.yml
, update the elasticsearch's setting:- module: index enabled: true ui: enabled: true elasticsearch: endpoint: http://dev:9200 index_prefix: gopa- username: elastic password: changeme
Start
Gopa doesn't require any dependencies, simply run ./gopa
to start the program.
Gopa can be run as daemon(Note: Only available on Linux and Mac):
Example
➜ gopa git:(master) ✗ ./bin/gopa --daemon ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \ / \ ___ / | \| ___/ /_\ \ \ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/ [gopa] 0.10.0_SNAPSHOT ///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 /// [10-21 16:01:09] [INF] [instance.go:23] workspace: data/gopa/nodes/0 [gopa] started.
Also run ./gopa -h
to get the full list of command line options.
Example
➜ gopa git:(master) ✗ ./bin/gopa -h ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \ / \ ___ / | \| ___/ /_\ \ \ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/ [gopa] 0.10.0_SNAPSHOT ///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 /// Usage of ./bin/gopa: -config string the location of config file (default "gopa.yml") -cpuprofile string write cpu profile to this file -daemon run in background as daemon -debug run in debug mode, wi -log string the log level,options:trace,debug,info,warn,error (default "info") -log_path string the log path (default "log") -memprofile string write memory profile to this file -pidfile string pidfile path (only for daemon) -pprof string enable and setup pprof/expvar service, eg: localhost:6060 , the endpoint will be: http://localhost:6060/debug/pprof/ and http://localhost:6060/debug/vars
Stop
It's safety to press ctrl+c
stop the current running Gopa, Gopa will handle the rest,saving the checkpoint,
you may restore the job later,the world is still in your hand.
If you are running Gopa
as daemon, you may stop it like this:
kill -QUIT `pgrep gopa`
Configuration
UI
- Search Console
http://127.0.0.1:9001/
- Admin Console
http://127.0.0.1:9001/admin/
API
- TBD
Contributing
You are sincerely and warmly welcomed to play with this project, from UI style to core features, or just a piece of document, welcome! let's make it better.
License
Released under the Apache License, Version 2.0 .
Also XSS Test
alert('XSS test');
为了改善大家的创作体验,提高大家的写作和分享热情!?,经过两天的不懈奋斗,终于把 Markdown 编辑器搬上来了。 目前只支持文章的发布,可以通过切换编辑器来选择 Markdown 编辑模式。 希望不要再以编辑器作为理由发只有链接的文章了。 ???????????
- 支持 Github 风格的 Markdown 格式
- 支持本站附件功能
- 支持 emoj 符号
- 支持自动的页面导航
- 以前的文章可再次编辑,切换 Markdown 模式然后修改保存
如何使用?
- 点击【发起】,选择文章
- 切换绿色按钮,将编辑器切换到 Markdown,然后在文本框内输入 Markdown 格式的内容即可。
在线 Markdown 脚本编辑预览工具:https://elasticsearch.cn/static/js/editor/markdown/
以下为样式测试参考,忽略其意义。
----------- 常用格式-----------------
# 标题1
## 标题2
### 标题3
#### 标题4
##### 标题5
###### 标题6
超大标题 //等于号写于文字下方
===
标题 //同超大标题
---
`短代码`
_ 注:长代码块,用三个: ` _
> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
* Red
* Green
* Blue
- Red
- Green
- Blue
+ Red
+ Green
+ Blue
1. 这是第一个
1. 这是第二个
1. 这是第三个
* * *
***
*****
- - -
---
[markdown-syntax](http://daringfireball.net/projects/markdown/syntax)
[id]: http://example.com/ "Optional Title Here"
This is [an example][id] reference-style link.
*内容*
**内容**
_内容_
__内容__
![这是张外链图片](https://static-www.elastic.co/assets/bltbfcd44f1256d8c88/blog-swifttype-thumb.jpg?q=845)
<http://elastic.co/>
<info@elastic.o>
四个空格
一个tab
----------- 样式预览-----------------
标题1
标题2
标题3
标题4
标题5
标题6
超大标题 //等于号写于文字下方
标题 //同超大标题
短代码
This is the first level of quoting.
This is nested blockquote.
Back to the first level.
- Red
- Green
-
Blue
- Red
- Green
-
Blue
- Red
- Green
- Blue
- 这是第一个
- 这是第二个
- 这是第三个
This is an example reference-style link.
内容 内容 内容 内容
四个空格
一个tab
https://github.com/infinitbyte/gopa 的 README 内容
GOPA, A Spider Written in Go.
Goal
- Light weight, low footprint, memory requirement should < 100MB
- Easy to deploy, no runtime or dependency required
- Easy to use, no programming or scripts ability needed, out of box features
Screenshoot
How to use
Setup
First of all, get it, two opinions: download the pre-built package or compile it yourself.
Download Pre Built Package
Go to Release or Snapshot page, download the right package for your platform.
Note: Darwin is for Mac
Compile The Package Manually
- Mac/Linux: Run
make build
to build the Gopa. - Windows: Checkout this wiki page - How to build GOPA on windows.
So far, we have:
gopa
, the main program, a single binary.
config/
, elasticsearch related scripts etc.
gopa.yml
, main configuration for gopa.
Optional Config
By default, Gopa works well except indexing, if you want to use elasticsearch as indexing, follow these steps:
- Create a index in elasticsearch with script
config/gopa-index-mapping.sh
Example
curl -XPUT "http://localhost:9200/gopa-index" -H 'Content-Type: application/json' -d' { "mappings": { "doc": { "properties": { "host": { "type": "keyword", "ignore_above": 256 }, "snapshot": { "properties": { "bold": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 }, "content_type": { "type": "keyword", "ignore_above": 256 }, "file": { "type": "keyword", "ignore_above": 256 }, "h1": { "type": "text" }, "h2": { "type": "text" }, "h3": { "type": "text" }, "h4": { "type": "text" }, "hash": { "type": "keyword", "ignore_above": 256 }, "id": { "type": "keyword", "ignore_above": 256 }, "images": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "italic": { "type": "text" }, "links": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "path": { "type": "keyword", "ignore_above": 256 }, "sim_hash": { "type": "keyword", "ignore_above": 256 }, "lang": { "type": "keyword", "ignore_above": 256 }, "size": { "type": "long" }, "text": { "type": "text" }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "version": { "type": "long" } } }, "task": { "properties": { "breadth": { "type": "long" }, "created": { "type": "date" }, "depth": { "type": "long" }, "id": { "type": "keyword", "ignore_above": 256 }, "original_url": { "type": "keyword", "ignore_above": 256 }, "reference_url": { "type": "keyword", "ignore_above": 256 }, "schema": { "type": "keyword", "ignore_above": 256 }, "status": { "type": "integer" }, "updated": { "type": "date" }, "url": { "type": "keyword", "ignore_above": 256 } } } } } } }'
Note: Elasticsearch version should > v5.0
- Enable index module in
gopa.yml
, update the elasticsearch's setting:- module: index enabled: true ui: enabled: true elasticsearch: endpoint: http://dev:9200 index_prefix: gopa- username: elastic password: changeme
Start
Gopa doesn't require any dependencies, simply run ./gopa
to start the program.
Gopa can be run as daemon(Note: Only available on Linux and Mac):
Example
➜ gopa git:(master) ✗ ./bin/gopa --daemon ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \ / \ ___ / | \| ___/ /_\ \ \ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/ [gopa] 0.10.0_SNAPSHOT ///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 /// [10-21 16:01:09] [INF] [instance.go:23] workspace: data/gopa/nodes/0 [gopa] started.
Also run ./gopa -h
to get the full list of command line options.
Example
➜ gopa git:(master) ✗ ./bin/gopa -h ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \ / \ ___ / | \| ___/ /_\ \ \ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/ [gopa] 0.10.0_SNAPSHOT ///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 /// Usage of ./bin/gopa: -config string the location of config file (default "gopa.yml") -cpuprofile string write cpu profile to this file -daemon run in background as daemon -debug run in debug mode, wi -log string the log level,options:trace,debug,info,warn,error (default "info") -log_path string the log path (default "log") -memprofile string write memory profile to this file -pidfile string pidfile path (only for daemon) -pprof string enable and setup pprof/expvar service, eg: localhost:6060 , the endpoint will be: http://localhost:6060/debug/pprof/ and http://localhost:6060/debug/vars
Stop
It's safety to press ctrl+c
stop the current running Gopa, Gopa will handle the rest,saving the checkpoint,
you may restore the job later,the world is still in your hand.
If you are running Gopa
as daemon, you may stop it like this:
kill -QUIT `pgrep gopa`
Configuration
UI
- Search Console
http://127.0.0.1:9001/
- Admin Console
http://127.0.0.1:9001/admin/
API
- TBD
Contributing
You are sincerely and warmly welcomed to play with this project, from UI style to core features, or just a piece of document, welcome! let's make it better.
License
Released under the Apache License, Version 2.0 .
Also XSS Test
alert('XSS test');
收起阅读 »社区日报 第96期 (2017-11-10)
http://t.cn/RlHuOKx
2、业界良心 | 《Elasticsearch5.6.3 Java API 中文手册》
https://elasticsearch.cn/article/362
3、PPT | 基于 Mesos/Docker 的 Elasticsearch 容器化私有云
http://t.cn/RlHuTQR
4、只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:laoyang360
归档:https://elasticsearch.cn/article/365
订阅:https://tinyletter.com/elastic-daily
http://t.cn/RlHuOKx
2、业界良心 | 《Elasticsearch5.6.3 Java API 中文手册》
https://elasticsearch.cn/article/362
3、PPT | 基于 Mesos/Docker 的 Elasticsearch 容器化私有云
http://t.cn/RlHuTQR
4、只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:laoyang360
归档:https://elasticsearch.cn/article/365
订阅:https://tinyletter.com/elastic-daily
收起阅读 »
Elastic Meetup 广州交流会
Elastic Meetup 线下交流活动再次来到羊城广州,算是社区在广州的第二次线下聚会了,广州的小伙伴们,快快报名吧! 回顾去年的线下活动,可以点击这里:https://elasticsearch.cn/article/71
主办:
本次活动由 Elastic
与 网易游戏运维与基础架构部
联合举办。
媒体:
本次活动由 IT大咖说
独家提供现场直播。
时间:
2017.11.25 下午2:00-5:00(1点半开始签到)
地点:
广州市天河区科韵路16号广州信息港E栋网易大厦 一楼博学堂
主题:
- 网易 - 杜鑫 - ELK在藏宝阁中的应用
- 酷狗 - 钟旺 - 基于ES的音乐搜索引擎
- 阿里云 - 赵弘扬 - Elasticsearch在阿里云的实践分享
- 网易 - 林邦骏 - 网易ELK 系统综述
- 数说故事 - 吴文杰 - Data Warehouse with ElasticSearch in Datastory
- 闪电分享(5-10分钟,可现场报名)
参会报名:
http://elasticsearch.mikecrm.com/O6o0yq3
现场直播:
直播连接:http://www.itdks.com/eventlist/detail/1673
主题介绍:
ELK在藏宝阁中的应用
内容介绍:
1. 藏宝阁项目介绍 主要介绍一下藏宝阁项目,让不熟悉藏宝阁的听众有一个基本的了解,熟悉应用的背景。
-
ELK在藏宝阁中的应用(概述) 大致简要的阐述一下ELK在藏宝阁中哪些地方发挥了什么样的作用。
- ELK在藏宝阁推荐系统中的应用(重点) 较为详细的剖析一下ELK在推荐系统中的发挥的作用,具备的优势。
分享嘉宾:
杜鑫,网易藏宝阁工作室资深开发工程师,目前主要从事藏宝阁推荐业务相关的研发工作。
网易ELK 系统综述
内容介绍:
从架构以及功能两个角度去阐述网易的 ELK 平台,介绍系统内部各个组件及其管理方式。进而以用户的视角介绍平台中包含的自动化服务等功能,从管理员的视角去讨论组件的配置管理、资源调度回收等问题。
分享嘉宾:
林邦骏,网易 GDC产品组资深运维工程师,主要负责内部 ELK 产品的运维、功能开发等工作。
基于ES的音乐搜索引擎
内容介绍:
1、酷狗音乐搜索引擎架构变迁
2、构建音乐搜索引擎经验之谈
分享嘉宾:
钟旺,酷狗后台开发工程师,从事JAVA、ES相关的开发工作。
Data Warehouse with ElasticSearch in Datastory
内容介绍:
ES最多使用的场景是搜索和日志分析,然而ES强大的实时索引查询、全文检索和聚合能力也能成为数据仓库与OLAP场景的强力支持。
本次分享将为大家带来数说故事如何借助ES和Hadoop生态在不同的数据场景下构建起数据仓库能力。
分享嘉宾:
吴文杰 ,数说故事平台架构团队 高级工程师,负责数说故事百亿级数据的存储查询及内部基础平台建设。
Elasticsearch在阿里云的实践分享
内容介绍
介绍阿里云Elastiserach服务的技术架构和Xpack相关功能,并分享在云上环境搭建ELK的实践案例。
分享嘉宾
赵弘扬,阿里巴巴搜索产品专家,负责阿里云搜索产品规划和开发。
深圳也在筹备中,可以提前报名!:https://elasticsearch.cn/article/261
关于 Elastic Meetup
Elastic Meetup 由 Elastic 中文社区定期举办的线下交流活动,主要围绕 Elastic 的开源产品(Elasticsearch、Logstash、Kibana 和 Beats)及周边技术,探讨在搜索、数据实时分析、日志分析、安全等领域的实践与应用。
关于 Elastic
Elastic 通过构建软件,让用户能够实时地、大规模地将数据用于搜索、日志和分析场景。Elastic 创立于 2012 年,相继开发了开源的 Elastic Stack(Elasticsearch、Kibana、Beats 和 Logstash)、X-Pack(商业功能)和 Elastic Cloud(托管服务)。截至目前,累计下载量超过 1.5 亿。Benchmark Capital、Index Ventures 和 NEA 为 Elastic 提供了超过 1 亿美元资金作为支持,Elastic 共有 600 多名员工,分布在 30 个国家/地区。有关更多信息,请访问 http://elastic.co/cn 。
关于网易游戏运维与基础架构部
网易游戏运维与基础架构部, 主要负责网易游戏产品的可靠性保障以及基础设施的开发和部署,旨在:
- 专注为产品全生命周期提供可靠性保障服务,依托于大数据为运维提供决策
- 通过智能监控提高问题发现和解决能力,以自动化驱动低成本的业务管理
- 打造混合云方案,站在游戏业务角度驱动的TCO优化和运维智能化
关于IT大咖说
IT大咖说,IT垂直领域的大咖知识分享平台,践行“开源是一种态度”,通过线上线下开放模式分享行业TOP大咖干货,技术大会在线直播点播,在线活动直播平台。http://www.itdks.com 。
再次感谢网易游戏运维与基础架构部和IT大咖说的大力支持!
Elastic Meetup 线下交流活动再次来到羊城广州,算是社区在广州的第二次线下聚会了,广州的小伙伴们,快快报名吧! 回顾去年的线下活动,可以点击这里:https://elasticsearch.cn/article/71
主办:
本次活动由 Elastic
与 网易游戏运维与基础架构部
联合举办。
媒体:
本次活动由 IT大咖说
独家提供现场直播。
时间:
2017.11.25 下午2:00-5:00(1点半开始签到)
地点:
广州市天河区科韵路16号广州信息港E栋网易大厦 一楼博学堂
主题:
- 网易 - 杜鑫 - ELK在藏宝阁中的应用
- 酷狗 - 钟旺 - 基于ES的音乐搜索引擎
- 阿里云 - 赵弘扬 - Elasticsearch在阿里云的实践分享
- 网易 - 林邦骏 - 网易ELK 系统综述
- 数说故事 - 吴文杰 - Data Warehouse with ElasticSearch in Datastory
- 闪电分享(5-10分钟,可现场报名)
参会报名:
http://elasticsearch.mikecrm.com/O6o0yq3
现场直播:
直播连接:http://www.itdks.com/eventlist/detail/1673
主题介绍:
ELK在藏宝阁中的应用
内容介绍:
1. 藏宝阁项目介绍 主要介绍一下藏宝阁项目,让不熟悉藏宝阁的听众有一个基本的了解,熟悉应用的背景。
-
ELK在藏宝阁中的应用(概述) 大致简要的阐述一下ELK在藏宝阁中哪些地方发挥了什么样的作用。
- ELK在藏宝阁推荐系统中的应用(重点) 较为详细的剖析一下ELK在推荐系统中的发挥的作用,具备的优势。
分享嘉宾:
杜鑫,网易藏宝阁工作室资深开发工程师,目前主要从事藏宝阁推荐业务相关的研发工作。
网易ELK 系统综述
内容介绍:
从架构以及功能两个角度去阐述网易的 ELK 平台,介绍系统内部各个组件及其管理方式。进而以用户的视角介绍平台中包含的自动化服务等功能,从管理员的视角去讨论组件的配置管理、资源调度回收等问题。
分享嘉宾:
林邦骏,网易 GDC产品组资深运维工程师,主要负责内部 ELK 产品的运维、功能开发等工作。
基于ES的音乐搜索引擎
内容介绍:
1、酷狗音乐搜索引擎架构变迁
2、构建音乐搜索引擎经验之谈
分享嘉宾:
钟旺,酷狗后台开发工程师,从事JAVA、ES相关的开发工作。
Data Warehouse with ElasticSearch in Datastory
内容介绍:
ES最多使用的场景是搜索和日志分析,然而ES强大的实时索引查询、全文检索和聚合能力也能成为数据仓库与OLAP场景的强力支持。
本次分享将为大家带来数说故事如何借助ES和Hadoop生态在不同的数据场景下构建起数据仓库能力。
分享嘉宾:
吴文杰 ,数说故事平台架构团队 高级工程师,负责数说故事百亿级数据的存储查询及内部基础平台建设。
Elasticsearch在阿里云的实践分享
内容介绍
介绍阿里云Elastiserach服务的技术架构和Xpack相关功能,并分享在云上环境搭建ELK的实践案例。
分享嘉宾
赵弘扬,阿里巴巴搜索产品专家,负责阿里云搜索产品规划和开发。
深圳也在筹备中,可以提前报名!:https://elasticsearch.cn/article/261
关于 Elastic Meetup
Elastic Meetup 由 Elastic 中文社区定期举办的线下交流活动,主要围绕 Elastic 的开源产品(Elasticsearch、Logstash、Kibana 和 Beats)及周边技术,探讨在搜索、数据实时分析、日志分析、安全等领域的实践与应用。
关于 Elastic
Elastic 通过构建软件,让用户能够实时地、大规模地将数据用于搜索、日志和分析场景。Elastic 创立于 2012 年,相继开发了开源的 Elastic Stack(Elasticsearch、Kibana、Beats 和 Logstash)、X-Pack(商业功能)和 Elastic Cloud(托管服务)。截至目前,累计下载量超过 1.5 亿。Benchmark Capital、Index Ventures 和 NEA 为 Elastic 提供了超过 1 亿美元资金作为支持,Elastic 共有 600 多名员工,分布在 30 个国家/地区。有关更多信息,请访问 http://elastic.co/cn 。
关于网易游戏运维与基础架构部
网易游戏运维与基础架构部, 主要负责网易游戏产品的可靠性保障以及基础设施的开发和部署,旨在:
- 专注为产品全生命周期提供可靠性保障服务,依托于大数据为运维提供决策
- 通过智能监控提高问题发现和解决能力,以自动化驱动低成本的业务管理
- 打造混合云方案,站在游戏业务角度驱动的TCO优化和运维智能化
关于IT大咖说
IT大咖说,IT垂直领域的大咖知识分享平台,践行“开源是一种态度”,通过线上线下开放模式分享行业TOP大咖干货,技术大会在线直播点播,在线活动直播平台。http://www.itdks.com 。
再次感谢网易游戏运维与基础架构部和IT大咖说的大力支持!
收起阅读 »社区日报 第95期 (2017-11-09)
http://t.cn/RlY7tMh
2.Spring Boot 中使用 Java API 调用 Elasticsearch
http://t.cn/RljQNFJ
3.一个实时查看,搜索尾部日志事件的kibana插件
http://t.cn/RcXglR2
招聘:京东北京招聘ES高级工程师
https://elasticsearch.cn/article/358
编辑:金桥
归档:https://elasticsearch.cn/article/363
订阅:https://tinyletter.com/elastic-daily
http://t.cn/RlY7tMh
2.Spring Boot 中使用 Java API 调用 Elasticsearch
http://t.cn/RljQNFJ
3.一个实时查看,搜索尾部日志事件的kibana插件
http://t.cn/RcXglR2
招聘:京东北京招聘ES高级工程师
https://elasticsearch.cn/article/358
编辑:金桥
归档:https://elasticsearch.cn/article/363
订阅:https://tinyletter.com/elastic-daily 收起阅读 »
Elasticsearch 5.6 Java API 中文手册
[Elasticsearch 5.6 Java API 中文手册]
本手册由 全科 翻译,并且整理成电子书,支持PDF,ePub,Mobi格式,方便大家下载阅读。
不只是官方文档的翻译,还包含使用实例,包含我们使用踩过的坑
阅读地址:https://es.quanke.name
下载地址:https://www.gitbook.com/book/q ... -java
github地址:https://github.com/quanke/elasticsearch-java
编辑:http://quanke.name
编辑整理辛苦,还望大神们点一下star ,抚平我虚荣的心
[全科的公众号]
[Elasticsearch 5.6 Java API 中文手册]
本手册由 全科 翻译,并且整理成电子书,支持PDF,ePub,Mobi格式,方便大家下载阅读。
不只是官方文档的翻译,还包含使用实例,包含我们使用踩过的坑
阅读地址:https://es.quanke.name
下载地址:https://www.gitbook.com/book/q ... -java
github地址:https://github.com/quanke/elasticsearch-java
编辑:http://quanke.name
编辑整理辛苦,还望大神们点一下star ,抚平我虚荣的心
[全科的公众号]
收起阅读 »
Bulk异常引发的Elasticsearch内存泄漏
2018年8月24日更新: 今天放出的6.4版修复了这个问题。
原文链接: http://www.jianshu.com/p/d4f7a6d58008
前天公司度假部门一个线上ElasticSearch集群发出报警,有Data Node的Heap使用量持续超过80%警戒线。 收到报警邮件后,不敢怠慢,立即登陆监控系统查看集群状态。还好,所有的结点都在正常服务,只是有2个结点的Heap使用率非常高。此时,Old GC一直在持续的触发,却无法回收内存。
初步排查
问题结点的Heap分配了30GB,80%的使用率约等于24GB。 但集群的数据总量并不大,5个结点所有索引文件加起来占用的磁盘空间还不到10GB。
GET /_cat/allocation?v&h=shards,disk.indices,disk.used,disk.avail
shards disk.indices disk.used disk.avail
3 1.9gb 38.3gb 89.7gb
4 2.2gb 13.4gb 114.6gb
4 2.5gb 20.3gb 107.7gb
4 2.3gb 33.9gb 94.1gb
3 1.7gb 12.8gb 115.2gb
查看各结点的segment memory和cache占用量也都非常小,是MB级别的。
GET /_cat/nodes?v&h=id,port,v,m,fdp,mc,mcs,sc,sm,qcm,fm,im,siwm,svmm
id port v m fdp mc mcs sc sm qcm fm siwm svmm
e1LV 9300 5.3.2 - 1 0 0b 68 69mb 1.5mb 1.9mb 0b 499b
5VnU 9300 5.3.2 - 1 0 0b 75 79mb 1.5mb 1.9mb 0b 622b
_Iob 9300 5.3.2 - 1 0 0b 56 55.7mb 1.3mb 914.1kb 0b 499b
4Kyl 9300 5.3.2 * 1 1 330.1mb 81 84.4mb 1.2mb 1.9mb 0b 622b
XEP_ 9300 5.3.2 - 1 0 0b 45 50.4mb 748.5kb 1mb 0b 622b
集群的QPS只有30上下,CPU消耗10%都不到,各类thread pool的活动线程数量也都非常低。
非常费解是什么东西占着20多GB的内存不释放?
出现问题的集群ES版本是5.3.2
,而这个版本的稳定性在公司内部已经经过长时间的考验,做为稳定版本在线上进行了大规模部署。 其他一些读写负载非常高的集群也未曾出现过类似的状况,看来是遇到新问题了。
查看问题结点ES的日志,除了看到一些Bulk异常以外,未见特别明显的其他和资源相关的错误:
[2017-11-06T16:33:15,668][DEBUG][o.e.a.b.TransportShardBulkAction] [] [suggest-3][0] failed to execute bulk item (update) BulkShardRequest [[suggest-3][0]] containing [44204
] requests
org.elasticsearch.index.engine.DocumentMissingException: [type][纳格尔果德_1198]: document missing
at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:92) ~[elasticsearch-5.3.2.jar:5.3.2]
at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:81) ~[elasticsearch-5.3.2.jar:5.3.2]
和用户确认这些异常的原因,是因为写入程序会从数据源拿到数据后,根据doc_id
对ES里的数据做update。会有部分doc_id
在ES里不存在的情况,但并不影响业务逻辑,因而ES记录的document missing
异常应该可以忽略。
至此别无他法,只能对JVM做Dump分析了。
Heap Dump分析
用的工具是Eclipse MAT,从这里下载的Mac版:Downloads 。 使用这个工具需要经过以下2个步骤:
- 获取二进制的head dump文件
jmap -dump:format=b,file=/tmp/es_heap.bin <pid>
其中pid是ES JAVA进程的进程号。 - 将生成的dump文件下载到本地开发机器,启动MAT,从其GUI打开文件。
要注意,MAT本身也是JAVA应用,需要有JDK运行环境的支持。
MAT第一次打dump文件的时候,需要对其解析,生成多个索引。这个过程比较消耗CPU和内存,但一旦完成,之后再打开dump文件就很快,消耗很低。 对于这种20多GB的大文件,第一次解析的过程会非常缓慢,并且很可能因为开发机内存的较少而内存溢出。因此,我找了台大内存的服务器来做第一次的解析工作:
-
将linux版的MAT拷贝上去,解压缩后,修改配置文件MemoryAnalyzer.ini,将内存设置为20GB左右:
$ cat MemoryAnalyzer.ini -startup plugins/org.eclipse.equinox.launcher_1.3.100.v20150511-1540.jar --launcher.library plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.300.v20150602-1417 -vmargs -Xmx20240m
这样能保证解析的过程中不会内存溢出。
- 将dump文件拷贝上去,执行下面几个命令生成索引及3个分析报告:
mat/ParseHeapDump.sh es_heap.bin org.eclipse.mat.api:suspects
mat/ParseHeapDump.sh es_heap.bin org.eclipse.mat.api:overview
mat/ParseHeapDump.sh es_heap.bin org.eclipse.mat.api:top_components
分析成功以后,会生成如下一堆索引文件(.index)和分析报告(.zip)
-rw-r--r--@ 1 xgwu staff 62M Nov 6 16:18 es_heap.a2s.index
-rw-r--r--@ 1 xgwu staff 25G Nov 6 14:59 es_heap.bin
-rw-r--r--@ 1 xgwu staff 90M Nov 6 16:21 es_heap.domIn.index
-rw-r--r--@ 1 xgwu staff 271M Nov 6 16:21 es_heap.domOut.index
-rw-r--r-- 1 xgwu staff 144K Nov 7 18:38 es_heap.i2sv2.index
-rw-r--r--@ 1 xgwu staff 220M Nov 6 16:18 es_heap.idx.index
-rw-r--r--@ 1 xgwu staff 356M Nov 6 16:20 es_heap.inbound.index
-rw-r--r--@ 1 xgwu staff 6.8M Nov 6 16:20 es_heap.index
-rw-r--r--@ 1 xgwu staff 76M Nov 6 16:18 es_heap.o2c.index
-rw-r--r--@ 1 xgwu staff 231M Nov 6 16:20 es_heap.o2hprof.index
-rw-r--r--@ 1 xgwu staff 206M Nov 6 16:21 es_heap.o2ret.index
-rw-r--r--@ 1 xgwu staff 353M Nov 6 16:20 es_heap.outbound.index
-rw-r--r--@ 1 xgwu staff 399K Nov 6 16:16 es_heap.threads
-rw-r--r--@ 1 xgwu staff 89K Nov 7 17:40 es_heap_Leak_Suspects.zip
-rw-r--r--@ 1 xgwu staff 78K Nov 6 19:22 es_heap_System_Overview.zip
-rw-r--r--@ 1 xgwu staff 205K Nov 6 19:22 es_heap_Top_Components.zip
drwxr-xr-x@ 3 xgwu staff 96B Nov 6 16:15 workspace
将这些文件打包下载到本地机器上,用MAT GUI打开就可以分析了。
在MAT里打开dump文件的时候,可以选择打开已经生成好的报告,比如Leak suspects:
通过Leak Suspects,一眼看到这20多GB内存主要是被一堆bulk线程实例占用了,每个实例则占用了接近1.5GB的内存。
进入"dominator_tree"面板,按照"Retained Heap"排序,可以看到多个bulk线程的内存占用都非常高。
将其中一个thread的引用链条展开,看看这些线程是如何Retain这么多内存的,特别注意红圈部分:
这个引用关系解读如下:
- 这个bulk线程的thread local map里保存了一个log4j的
MultableLogEvent
对象。 MutablelogEvent
对象引用了log4j的ParameterizedMessage
对象。ParameterizedMessage
引用了bulkShardRequest
对象。bulkShardRequest
引用了4万多个BulkitemRequest
对象。
这样看下来,似乎是log4j的logevent对一个大的bulk请求对象有强引用而导致其无法被垃圾回收掉,产生内存泄漏。
联想到ES日志里,有记录一些document missing
的bulk异常,猜测是否在记录这些异常的时候产生的泄漏。
问题复现
为了验证猜测,我在本地开发机上,启动了一个单结点的5.3.2
测试集群,用bulk api做批量的update,并且有意为其中1个update请求设置不存在的doc_id。为了便于测试,我在ES的配置文件elasticsearch.yml
里添加了配置项processors: 1
。 这个配置项影响集群thread_pool的配置,bulk thread pool的大小将减少为1个,这样可以更快速和便捷的做各类验证。
启动集群,发送完bulk请求后,立即做一个dump,重复之前的分析过程,问题得到了复现。 这时候想,是否其他bulk异常也会引起同样的问题,比如写入的数据和mapping不匹配? 测试了一下,问题果然还是会产生。再用不同的bulk size进行测试,发现无法回收的这段内存大小,取决于最后一次抛过异常的bulk size大小。至此,基本可以确定内存泄漏与log4j记录异常消息的逻辑有关系。
为了搞清楚这个问题是否5.3.2
独有,后续版本是否有修复,在最新的5.6.3
上做了同样的测试,问题依旧,因此这应该是一个还未发现的深层Bug.
读源码查根源
大致搞清楚问题查找的方向了,但根源还未找到,也就不知道如何修复和避免,只有去扒源码了。
在TransportShardBulkAction
第209行,找到了ES日志里抛异常的代码片段。
if (isConflictException(failure)) {
logger.trace((Supplier<?>) () -> new ParameterizedMessage("{} failed to execute bulk item ({}) {}",
request.shardId(), docWriteRequest.opType().getLowercase(), request), failure);
} else {
logger.debug((Supplier<?>) () -> new ParameterizedMessage("{} failed to execute bulk item ({}) {}",
request.shardId(), docWriteRequest.opType().getLowercase(), request), failure);
}
这里看到了ParameterizedMessage
实例化过程中,request
做为一个参数传入了。这里的request
是一个BulkShardRequest
对象,保存的是要写入到一个shard的一批bulk item request。 这样以来,一个批次写入的请求数量越多,这个对象retain的内存就越多。 可问题是,为什么logger.debug()调用完毕以后,这个引用不会被释放?
通过和之前MAT上的dominator tree仔细对比,可以看到ParameterizedMessage
之所以无法释放,是因为被一个MutableLogEvent
在引用,而这个MutableLogEvent
被做为一个thread local存放起来了。 由于ES的Bulk thread pool是fix size的,也就是预先创建好,不会销毁和再创建。 那么这些MutableLogEvent
对象由于是thread local的,只要线程没有销毁,就会对该线程实例一直全局存在,并且其还会一直引用最后一次处理过的ParameterizedMessage
。 所以在ES记录bulk exception这种比较大的请求情况下, 整个request对象会被thread local变量一直强引用无法释放,产生大量的内存泄漏。
再继续挖一下log4j的源码,发现MutableLogEvent
是在org.apache.logging.log4j.core.impl.ReusableLogEventFactory
里做为thread local创建的。
public class ReusableLogEventFactory implements LogEventFactory {
private static final ThreadNameCachingStrategy THREAD_NAME_CACHING_STRATEGY = ThreadNameCachingStrategy.create();
private static final Clock CLOCK = ClockFactory.getClock();
private static ThreadLocal<MutableLogEvent> mutableLogEventThreadLocal = new ThreadLocal<>();
而org.apache.logging.log4j.core.config.LoggerConfig
则根据一个常数ENABLE_THREADLOCALS
的值来决定用哪个LogEventFactory。
if (LOG_EVENT_FACTORY == null) {
LOG_EVENT_FACTORY = Constants.ENABLE_THREADLOCALS
? new ReusableLogEventFactory()
: new DefaultLogEventFactory();
}
继续深挖,在org.apache.logging.log4j.util.Constants
里看到,log4j会根据运行环境判断是否是WEB应用,如果不是,就从系统参数log4j2.enable.threadlocals
读取这个常量,如果没有设置,则默认值是true
。
public static final boolean ENABLE_THREADLOCALS = !IS_WEB_APP && PropertiesUtil.getProperties().getBooleanProperty(
"log4j2.enable.threadlocals", true);
由于ES不是一个web应用,导致log4j选择使用了ReusableLogEventFactory
,因而使用了thread_local来创建MutableLogEvent
对象,最终在ES记录bulk exception这个特殊场景下产生非常显著的内存泄漏。
再问一个问题,为何log4j要将logevent做为thread local创建? 跑到log4j的官网去扒了一下文档,在这里 Garbage-free Steady State Logging 找到了合理的解释。 原来为了减少记录日志过程中的反复创建的对象数量,减轻GC压力从而提高性能,log4j有很多地方使用了thread_local来重用变量。 但使用thread local字段装载非JDK类,可能会产生内存泄漏问题,特别是对于web应用。 因此才会在启动的时候判断运行环境,对于web应用会禁用thread local类型的变量。
ThreadLocal fields holding non-JDK classes can cause memory leaks in web applications when the application server's thread pool continues to reference these fields after the web application is undeployed. To avoid causing memory leaks, Log4j will not use these ThreadLocals when it detects that it is used in a web application (when the javax.servlet.Servlet class is in the classpath, or when system property log4j2.is.webapp is set to "true").
参考上面的文档后,也为ES找到了规避这个问题的措施: 在ES的JVM配置文件jvm.options
里,添加一个log4j的系统变量-Dlog4j2.enable.threadlocals=false
,禁用掉thread local即可。 经过测试,该选项可以有效避开这个内存泄漏问题。
这个问题Github上也提交了Issue,对应的链接是: Memory leak upon partial TransportShardBulkAction failure
写在最后
ES的确是非常复杂的一个系统,包含非常多的模块和第三方组件,可以支持很多想象不到的用例场景,但一些边缘场景可能会引发一些难以排查的问题。完备的监控体系和一个经验丰富的支撑团队对于提升业务开发人员使用ES开发的效率、提升业务的稳定性是非常重要的!
2018年8月24日更新: 今天放出的6.4版修复了这个问题。
原文链接: http://www.jianshu.com/p/d4f7a6d58008
前天公司度假部门一个线上ElasticSearch集群发出报警,有Data Node的Heap使用量持续超过80%警戒线。 收到报警邮件后,不敢怠慢,立即登陆监控系统查看集群状态。还好,所有的结点都在正常服务,只是有2个结点的Heap使用率非常高。此时,Old GC一直在持续的触发,却无法回收内存。
初步排查
问题结点的Heap分配了30GB,80%的使用率约等于24GB。 但集群的数据总量并不大,5个结点所有索引文件加起来占用的磁盘空间还不到10GB。
GET /_cat/allocation?v&h=shards,disk.indices,disk.used,disk.avail
shards disk.indices disk.used disk.avail
3 1.9gb 38.3gb 89.7gb
4 2.2gb 13.4gb 114.6gb
4 2.5gb 20.3gb 107.7gb
4 2.3gb 33.9gb 94.1gb
3 1.7gb 12.8gb 115.2gb
查看各结点的segment memory和cache占用量也都非常小,是MB级别的。
GET /_cat/nodes?v&h=id,port,v,m,fdp,mc,mcs,sc,sm,qcm,fm,im,siwm,svmm
id port v m fdp mc mcs sc sm qcm fm siwm svmm
e1LV 9300 5.3.2 - 1 0 0b 68 69mb 1.5mb 1.9mb 0b 499b
5VnU 9300 5.3.2 - 1 0 0b 75 79mb 1.5mb 1.9mb 0b 622b
_Iob 9300 5.3.2 - 1 0 0b 56 55.7mb 1.3mb 914.1kb 0b 499b
4Kyl 9300 5.3.2 * 1 1 330.1mb 81 84.4mb 1.2mb 1.9mb 0b 622b
XEP_ 9300 5.3.2 - 1 0 0b 45 50.4mb 748.5kb 1mb 0b 622b
集群的QPS只有30上下,CPU消耗10%都不到,各类thread pool的活动线程数量也都非常低。
非常费解是什么东西占着20多GB的内存不释放?
出现问题的集群ES版本是5.3.2
,而这个版本的稳定性在公司内部已经经过长时间的考验,做为稳定版本在线上进行了大规模部署。 其他一些读写负载非常高的集群也未曾出现过类似的状况,看来是遇到新问题了。
查看问题结点ES的日志,除了看到一些Bulk异常以外,未见特别明显的其他和资源相关的错误:
[2017-11-06T16:33:15,668][DEBUG][o.e.a.b.TransportShardBulkAction] [] [suggest-3][0] failed to execute bulk item (update) BulkShardRequest [[suggest-3][0]] containing [44204
] requests
org.elasticsearch.index.engine.DocumentMissingException: [type][纳格尔果德_1198]: document missing
at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:92) ~[elasticsearch-5.3.2.jar:5.3.2]
at org.elasticsearch.action.update.UpdateHelper.prepare(UpdateHelper.java:81) ~[elasticsearch-5.3.2.jar:5.3.2]
和用户确认这些异常的原因,是因为写入程序会从数据源拿到数据后,根据doc_id
对ES里的数据做update。会有部分doc_id
在ES里不存在的情况,但并不影响业务逻辑,因而ES记录的document missing
异常应该可以忽略。
至此别无他法,只能对JVM做Dump分析了。
Heap Dump分析
用的工具是Eclipse MAT,从这里下载的Mac版:Downloads 。 使用这个工具需要经过以下2个步骤:
- 获取二进制的head dump文件
jmap -dump:format=b,file=/tmp/es_heap.bin <pid>
其中pid是ES JAVA进程的进程号。 - 将生成的dump文件下载到本地开发机器,启动MAT,从其GUI打开文件。
要注意,MAT本身也是JAVA应用,需要有JDK运行环境的支持。
MAT第一次打dump文件的时候,需要对其解析,生成多个索引。这个过程比较消耗CPU和内存,但一旦完成,之后再打开dump文件就很快,消耗很低。 对于这种20多GB的大文件,第一次解析的过程会非常缓慢,并且很可能因为开发机内存的较少而内存溢出。因此,我找了台大内存的服务器来做第一次的解析工作:
-
将linux版的MAT拷贝上去,解压缩后,修改配置文件MemoryAnalyzer.ini,将内存设置为20GB左右:
$ cat MemoryAnalyzer.ini -startup plugins/org.eclipse.equinox.launcher_1.3.100.v20150511-1540.jar --launcher.library plugins/org.eclipse.equinox.launcher.gtk.linux.x86_64_1.1.300.v20150602-1417 -vmargs -Xmx20240m
这样能保证解析的过程中不会内存溢出。
- 将dump文件拷贝上去,执行下面几个命令生成索引及3个分析报告:
mat/ParseHeapDump.sh es_heap.bin org.eclipse.mat.api:suspects
mat/ParseHeapDump.sh es_heap.bin org.eclipse.mat.api:overview
mat/ParseHeapDump.sh es_heap.bin org.eclipse.mat.api:top_components
分析成功以后,会生成如下一堆索引文件(.index)和分析报告(.zip)
-rw-r--r--@ 1 xgwu staff 62M Nov 6 16:18 es_heap.a2s.index
-rw-r--r--@ 1 xgwu staff 25G Nov 6 14:59 es_heap.bin
-rw-r--r--@ 1 xgwu staff 90M Nov 6 16:21 es_heap.domIn.index
-rw-r--r--@ 1 xgwu staff 271M Nov 6 16:21 es_heap.domOut.index
-rw-r--r-- 1 xgwu staff 144K Nov 7 18:38 es_heap.i2sv2.index
-rw-r--r--@ 1 xgwu staff 220M Nov 6 16:18 es_heap.idx.index
-rw-r--r--@ 1 xgwu staff 356M Nov 6 16:20 es_heap.inbound.index
-rw-r--r--@ 1 xgwu staff 6.8M Nov 6 16:20 es_heap.index
-rw-r--r--@ 1 xgwu staff 76M Nov 6 16:18 es_heap.o2c.index
-rw-r--r--@ 1 xgwu staff 231M Nov 6 16:20 es_heap.o2hprof.index
-rw-r--r--@ 1 xgwu staff 206M Nov 6 16:21 es_heap.o2ret.index
-rw-r--r--@ 1 xgwu staff 353M Nov 6 16:20 es_heap.outbound.index
-rw-r--r--@ 1 xgwu staff 399K Nov 6 16:16 es_heap.threads
-rw-r--r--@ 1 xgwu staff 89K Nov 7 17:40 es_heap_Leak_Suspects.zip
-rw-r--r--@ 1 xgwu staff 78K Nov 6 19:22 es_heap_System_Overview.zip
-rw-r--r--@ 1 xgwu staff 205K Nov 6 19:22 es_heap_Top_Components.zip
drwxr-xr-x@ 3 xgwu staff 96B Nov 6 16:15 workspace
将这些文件打包下载到本地机器上,用MAT GUI打开就可以分析了。
在MAT里打开dump文件的时候,可以选择打开已经生成好的报告,比如Leak suspects:
通过Leak Suspects,一眼看到这20多GB内存主要是被一堆bulk线程实例占用了,每个实例则占用了接近1.5GB的内存。
进入"dominator_tree"面板,按照"Retained Heap"排序,可以看到多个bulk线程的内存占用都非常高。
将其中一个thread的引用链条展开,看看这些线程是如何Retain这么多内存的,特别注意红圈部分:
这个引用关系解读如下:
- 这个bulk线程的thread local map里保存了一个log4j的
MultableLogEvent
对象。 MutablelogEvent
对象引用了log4j的ParameterizedMessage
对象。ParameterizedMessage
引用了bulkShardRequest
对象。bulkShardRequest
引用了4万多个BulkitemRequest
对象。
这样看下来,似乎是log4j的logevent对一个大的bulk请求对象有强引用而导致其无法被垃圾回收掉,产生内存泄漏。
联想到ES日志里,有记录一些document missing
的bulk异常,猜测是否在记录这些异常的时候产生的泄漏。
问题复现
为了验证猜测,我在本地开发机上,启动了一个单结点的5.3.2
测试集群,用bulk api做批量的update,并且有意为其中1个update请求设置不存在的doc_id。为了便于测试,我在ES的配置文件elasticsearch.yml
里添加了配置项processors: 1
。 这个配置项影响集群thread_pool的配置,bulk thread pool的大小将减少为1个,这样可以更快速和便捷的做各类验证。
启动集群,发送完bulk请求后,立即做一个dump,重复之前的分析过程,问题得到了复现。 这时候想,是否其他bulk异常也会引起同样的问题,比如写入的数据和mapping不匹配? 测试了一下,问题果然还是会产生。再用不同的bulk size进行测试,发现无法回收的这段内存大小,取决于最后一次抛过异常的bulk size大小。至此,基本可以确定内存泄漏与log4j记录异常消息的逻辑有关系。
为了搞清楚这个问题是否5.3.2
独有,后续版本是否有修复,在最新的5.6.3
上做了同样的测试,问题依旧,因此这应该是一个还未发现的深层Bug.
读源码查根源
大致搞清楚问题查找的方向了,但根源还未找到,也就不知道如何修复和避免,只有去扒源码了。
在TransportShardBulkAction
第209行,找到了ES日志里抛异常的代码片段。
if (isConflictException(failure)) {
logger.trace((Supplier<?>) () -> new ParameterizedMessage("{} failed to execute bulk item ({}) {}",
request.shardId(), docWriteRequest.opType().getLowercase(), request), failure);
} else {
logger.debug((Supplier<?>) () -> new ParameterizedMessage("{} failed to execute bulk item ({}) {}",
request.shardId(), docWriteRequest.opType().getLowercase(), request), failure);
}
这里看到了ParameterizedMessage
实例化过程中,request
做为一个参数传入了。这里的request
是一个BulkShardRequest
对象,保存的是要写入到一个shard的一批bulk item request。 这样以来,一个批次写入的请求数量越多,这个对象retain的内存就越多。 可问题是,为什么logger.debug()调用完毕以后,这个引用不会被释放?
通过和之前MAT上的dominator tree仔细对比,可以看到ParameterizedMessage
之所以无法释放,是因为被一个MutableLogEvent
在引用,而这个MutableLogEvent
被做为一个thread local存放起来了。 由于ES的Bulk thread pool是fix size的,也就是预先创建好,不会销毁和再创建。 那么这些MutableLogEvent
对象由于是thread local的,只要线程没有销毁,就会对该线程实例一直全局存在,并且其还会一直引用最后一次处理过的ParameterizedMessage
。 所以在ES记录bulk exception这种比较大的请求情况下, 整个request对象会被thread local变量一直强引用无法释放,产生大量的内存泄漏。
再继续挖一下log4j的源码,发现MutableLogEvent
是在org.apache.logging.log4j.core.impl.ReusableLogEventFactory
里做为thread local创建的。
public class ReusableLogEventFactory implements LogEventFactory {
private static final ThreadNameCachingStrategy THREAD_NAME_CACHING_STRATEGY = ThreadNameCachingStrategy.create();
private static final Clock CLOCK = ClockFactory.getClock();
private static ThreadLocal<MutableLogEvent> mutableLogEventThreadLocal = new ThreadLocal<>();
而org.apache.logging.log4j.core.config.LoggerConfig
则根据一个常数ENABLE_THREADLOCALS
的值来决定用哪个LogEventFactory。
if (LOG_EVENT_FACTORY == null) {
LOG_EVENT_FACTORY = Constants.ENABLE_THREADLOCALS
? new ReusableLogEventFactory()
: new DefaultLogEventFactory();
}
继续深挖,在org.apache.logging.log4j.util.Constants
里看到,log4j会根据运行环境判断是否是WEB应用,如果不是,就从系统参数log4j2.enable.threadlocals
读取这个常量,如果没有设置,则默认值是true
。
public static final boolean ENABLE_THREADLOCALS = !IS_WEB_APP && PropertiesUtil.getProperties().getBooleanProperty(
"log4j2.enable.threadlocals", true);
由于ES不是一个web应用,导致log4j选择使用了ReusableLogEventFactory
,因而使用了thread_local来创建MutableLogEvent
对象,最终在ES记录bulk exception这个特殊场景下产生非常显著的内存泄漏。
再问一个问题,为何log4j要将logevent做为thread local创建? 跑到log4j的官网去扒了一下文档,在这里 Garbage-free Steady State Logging 找到了合理的解释。 原来为了减少记录日志过程中的反复创建的对象数量,减轻GC压力从而提高性能,log4j有很多地方使用了thread_local来重用变量。 但使用thread local字段装载非JDK类,可能会产生内存泄漏问题,特别是对于web应用。 因此才会在启动的时候判断运行环境,对于web应用会禁用thread local类型的变量。
ThreadLocal fields holding non-JDK classes can cause memory leaks in web applications when the application server's thread pool continues to reference these fields after the web application is undeployed. To avoid causing memory leaks, Log4j will not use these ThreadLocals when it detects that it is used in a web application (when the javax.servlet.Servlet class is in the classpath, or when system property log4j2.is.webapp is set to "true").
参考上面的文档后,也为ES找到了规避这个问题的措施: 在ES的JVM配置文件jvm.options
里,添加一个log4j的系统变量-Dlog4j2.enable.threadlocals=false
,禁用掉thread local即可。 经过测试,该选项可以有效避开这个内存泄漏问题。
这个问题Github上也提交了Issue,对应的链接是: Memory leak upon partial TransportShardBulkAction failure
写在最后
ES的确是非常复杂的一个系统,包含非常多的模块和第三方组件,可以支持很多想象不到的用例场景,但一些边缘场景可能会引发一些难以排查的问题。完备的监控体系和一个经验丰富的支撑团队对于提升业务开发人员使用ES开发的效率、提升业务的稳定性是非常重要的!
收起阅读 »社区日报 第94期 (2017-11-08)
Part1 http://t.cn/R5eAIJz
Part2 http://t.cn/RtCo3Sw
Part3 http://t.cn/Rt0avHj
2. Siddontang 大神的 Elasticsearch学习笔记(在 github 上,版本不是很新,仅供参考)
http://t.cn/Rl0kKfd
3. Elasticsearch 数据备份,恢复,及迁移(2015年文章)
http://t.cn/RL3YX6g
编辑:江水
归档:https://elasticsearch.cn/article/360
订阅:https://tinyletter.com/elastic-daily
Part1 http://t.cn/R5eAIJz
Part2 http://t.cn/RtCo3Sw
Part3 http://t.cn/Rt0avHj
2. Siddontang 大神的 Elasticsearch学习笔记(在 github 上,版本不是很新,仅供参考)
http://t.cn/Rl0kKfd
3. Elasticsearch 数据备份,恢复,及迁移(2015年文章)
http://t.cn/RL3YX6g
编辑:江水
归档:https://elasticsearch.cn/article/360
订阅:https://tinyletter.com/elastic-daily 收起阅读 »
Elastic XPack 对初创公司开放优惠申请啦!
如果你是创业公司的员工,并且你们在使用 elastic 的产品解决自己的业务问题,比如 elasticsearch、kibana、logstash 等,又对 X-Pack 很感兴趣,现在可以申请初创公司优惠价格了,真的很优惠,走过路过不要错过!
初创公司定义为:
1. 公司人数50人以内
2. 年销售额500万以内
3. 注册资金2500万以内。
申请方式为:
访问 http://elastictech.cn ,点击右上角的【创业公司优惠申请】链接填写相关信息即可!
如果你是创业公司的员工,并且你们在使用 elastic 的产品解决自己的业务问题,比如 elasticsearch、kibana、logstash 等,又对 X-Pack 很感兴趣,现在可以申请初创公司优惠价格了,真的很优惠,走过路过不要错过!
初创公司定义为:
1. 公司人数50人以内
2. 年销售额500万以内
3. 注册资金2500万以内。
申请方式为:
访问 http://elastictech.cn ,点击右上角的【创业公司优惠申请】链接填写相关信息即可! 收起阅读 »