Elasticsearch Java API - 客户端连接(TransportClient,PreBuiltXPackTransportClient)(一)
Elasticsearch Java API 客户端连接
一个是TransportClient
,一个是NodeClient
,还有一个XPackTransportClient
- TransportClient:
作为一个外部访问者,请求ES的集群,对于集群而言,它是一个外部因素。
- NodeClient
作为ES集群的一个节点,它是ES中的一环,其他的节点对它是感知的。
- XPackTransportClient:
服务安装了 x-pack
插件
重要:客户端版本应该和服务端版本保持一致
TransportClient旨在被Java高级REST客户端取代,该客户端执行HTTP请求而不是序列化的Java请求。 在即将到来的Elasticsearch版本中将不赞成使用TransportClient,建议使用Java高级REST客户端。
上面的警告比较尴尬,但是在 5xx版本中使用还是没有问题的,可能使用rest 客户端兼容性更好做一些。
Elasticsearch Java Rest API 手册
Maven Repository
Elasticsearch Java API包已经上传到 Maven Central
在pom.xml
文件中增加:
transport 版本号最好就是与Elasticsearch版本号一致。
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>5.6.3</version>
</dependency>
Transport Client
不设置集群名称
// on startup
//此步骤添加IP,至少一个,如果设置了"client.transport.sniff"= true 一个就够了,因为添加了自动嗅探配置
TransportClient client = new PreBuiltTransportClient(Settings.EMPTY)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host1"), 9300))
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host2"), 9300));
// on shutdown 关闭client
client.close();
设置集群名称
Settings settings = Settings.builder()
.put("cluster.name", "myClusterName").build(); //设置ES实例的名称
TransportClient client = new PreBuiltTransportClient(settings); //自动嗅探整个集群的状态,把集群中其他ES节点的ip添加到本地的客户端列表中
//Add transport addresses and do something with the client...
增加自动嗅探配置
Settings settings = Settings.builder()
.put("client.transport.sniff", true).build();
TransportClient client = new PreBuiltTransportClient(settings);
其他配置
client.transport.ignore_cluster_name //设置 true ,忽略连接节点集群名验证
client.transport.ping_timeout //ping一个节点的响应时间 默认5秒
client.transport.nodes_sampler_interval //sample/ping 节点的时间间隔,默认是5s
对于ES Client,有两种形式,一个是TransportClient,一个是NodeClient。两个的区别为: TransportClient作为一个外部访问者,通过HTTP去请求ES的集群,对于集群而言,它是一个外部因素。 NodeClient顾名思义,是作为ES集群的一个节点,它是ES中的一环,其他的节点对它是感知的,不像TransportClient那样,ES集群对它一无所知。NodeClient通信的性能会更好,但是因为是ES的一环,所以它出问题,也会给ES集群带来问题。NodeClient可以设置不作为数据节点,在elasticsearch.yml中设置,这样就不会在此节点上分配数据。
如果用ES的节点,仁者见仁智者见智。
实例
package name.quanke.es.study;
import name.quanke.es.study.util.Utils;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.junit.After;
import org.junit.Before;
import java.net.InetAddress;
/**
* Elasticsearch 5.5.1 的client 和 ElasticsearchTemplate的初始化
* 作为一个外部访问者,请求ES的集群,对于集群而言,它是一个外部因素。
* Created by http://quanke.name on 2017/11/10.
*/
public class ElasticsearchClient {
protected TransportClient client;
@Before
public void setUp() throws Exception {
Settings esSettings = Settings.builder()
.put("cluster.name", "utan-es") //设置ES实例的名称
.put("client.transport.sniff", true) //自动嗅探整个集群的状态,把集群中其他ES节点的ip添加到本地的客户端列表中
.build();
/**
* 这里的连接方式指的是没有安装x-pack插件,如果安装了x-pack则参考{@link ElasticsearchXPackClient}
* 1. java客户端的方式是以tcp协议在9300端口上进行通信
* 2. http客户端的方式是以http协议在9200端口上进行通信
*/
client = new PreBuiltTransportClient(esSettings)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("192.168.1.10"), 9300));
System.out.println("ElasticsearchClient 连接成功");
}
@After
public void tearDown() throws Exception {
if (client != null) {
client.close();
}
}
protected void println(SearchResponse searchResponse) {
Utils.println(searchResponse);
}
}
本实例代码已经上传到 Git ElasticsearchClient.java
所有实例 已经上传到Git
XPackTransportClient
如果 ElasticSearch
服务安装了 x-pack
插件,需要PreBuiltXPackTransportClient
实例才能访问
使用Maven管理项目,把下面代码增加到pom.xml
;
一定要修改默认仓库地址为https://artifacts.elastic.co/maven ,因为这个库没有上传到Maven中央仓库,如果有自己的 maven ,请配置代理
<project ...>
<repositories>
<!-- add the elasticsearch repo -->
<repository>
<id>elasticsearch-releases</id>
<url>https://artifacts.elastic.co/maven</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
...
</repositories>
...
<dependencies>
<!-- add the x-pack jar as a dependency -->
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>x-pack-transport</artifactId>
<version>5.6.3</version>
</dependency>
...
</dependencies>
...
</project>
实例
/**
* Elasticsearch XPack Client
* Created by http://quanke.name on 2017/11/10.
*/
public class ElasticsearchXPackClient {
protected TransportClient client;
@Before
public void setUp() throws Exception {
/**
* 如果es集群安装了x-pack插件则以此种方式连接集群
* 1. java客户端的方式是以tcp协议在9300端口上进行通信
* 2. http客户端的方式是以http协议在9200端口上进行通信
*/
Settings settings = Settings.builder()
.put("xpack.security.user", "elastic:utan100")
.put("cluster.name", "utan-es")
.build();
client = new PreBuiltXPackTransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("192.168.1.10"), 9300));
// final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
// credentialsProvider.setCredentials(AuthScope.ANY,
// new UsernamePasswordCredentials("elastic", "utan100"));
System.out.println("ElasticsearchXPackClient 启动成功");
}
@Test
public void testClientConnection() throws Exception {
System.out.println("--------------------------");
}
@After
public void tearDown() throws Exception {
if (client != null) {
client.close();
}
}
protected void println(SearchResponse searchResponse) {
Utils.println(searchResponse);
}
}
本实例代码已经上传到 Git ElasticsearchXPackClient.java
所有实例 已经上传到Git
更多请浏览 spring-boot-starter-es 开源项目
如何有任何问题请关注微信公众号给我留言
Elasticsearch Java API 客户端连接
一个是TransportClient
,一个是NodeClient
,还有一个XPackTransportClient
- TransportClient:
作为一个外部访问者,请求ES的集群,对于集群而言,它是一个外部因素。
- NodeClient
作为ES集群的一个节点,它是ES中的一环,其他的节点对它是感知的。
- XPackTransportClient:
服务安装了 x-pack
插件
重要:客户端版本应该和服务端版本保持一致
TransportClient旨在被Java高级REST客户端取代,该客户端执行HTTP请求而不是序列化的Java请求。 在即将到来的Elasticsearch版本中将不赞成使用TransportClient,建议使用Java高级REST客户端。
上面的警告比较尴尬,但是在 5xx版本中使用还是没有问题的,可能使用rest 客户端兼容性更好做一些。
Elasticsearch Java Rest API 手册
Maven Repository
Elasticsearch Java API包已经上传到 Maven Central
在pom.xml
文件中增加:
transport 版本号最好就是与Elasticsearch版本号一致。
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>transport</artifactId>
<version>5.6.3</version>
</dependency>
Transport Client
不设置集群名称
// on startup
//此步骤添加IP,至少一个,如果设置了"client.transport.sniff"= true 一个就够了,因为添加了自动嗅探配置
TransportClient client = new PreBuiltTransportClient(Settings.EMPTY)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host1"), 9300))
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("host2"), 9300));
// on shutdown 关闭client
client.close();
设置集群名称
Settings settings = Settings.builder()
.put("cluster.name", "myClusterName").build(); //设置ES实例的名称
TransportClient client = new PreBuiltTransportClient(settings); //自动嗅探整个集群的状态,把集群中其他ES节点的ip添加到本地的客户端列表中
//Add transport addresses and do something with the client...
增加自动嗅探配置
Settings settings = Settings.builder()
.put("client.transport.sniff", true).build();
TransportClient client = new PreBuiltTransportClient(settings);
其他配置
client.transport.ignore_cluster_name //设置 true ,忽略连接节点集群名验证
client.transport.ping_timeout //ping一个节点的响应时间 默认5秒
client.transport.nodes_sampler_interval //sample/ping 节点的时间间隔,默认是5s
对于ES Client,有两种形式,一个是TransportClient,一个是NodeClient。两个的区别为: TransportClient作为一个外部访问者,通过HTTP去请求ES的集群,对于集群而言,它是一个外部因素。 NodeClient顾名思义,是作为ES集群的一个节点,它是ES中的一环,其他的节点对它是感知的,不像TransportClient那样,ES集群对它一无所知。NodeClient通信的性能会更好,但是因为是ES的一环,所以它出问题,也会给ES集群带来问题。NodeClient可以设置不作为数据节点,在elasticsearch.yml中设置,这样就不会在此节点上分配数据。
如果用ES的节点,仁者见仁智者见智。
实例
package name.quanke.es.study;
import name.quanke.es.study.util.Utils;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.transport.client.PreBuiltTransportClient;
import org.junit.After;
import org.junit.Before;
import java.net.InetAddress;
/**
* Elasticsearch 5.5.1 的client 和 ElasticsearchTemplate的初始化
* 作为一个外部访问者,请求ES的集群,对于集群而言,它是一个外部因素。
* Created by http://quanke.name on 2017/11/10.
*/
public class ElasticsearchClient {
protected TransportClient client;
@Before
public void setUp() throws Exception {
Settings esSettings = Settings.builder()
.put("cluster.name", "utan-es") //设置ES实例的名称
.put("client.transport.sniff", true) //自动嗅探整个集群的状态,把集群中其他ES节点的ip添加到本地的客户端列表中
.build();
/**
* 这里的连接方式指的是没有安装x-pack插件,如果安装了x-pack则参考{@link ElasticsearchXPackClient}
* 1. java客户端的方式是以tcp协议在9300端口上进行通信
* 2. http客户端的方式是以http协议在9200端口上进行通信
*/
client = new PreBuiltTransportClient(esSettings)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("192.168.1.10"), 9300));
System.out.println("ElasticsearchClient 连接成功");
}
@After
public void tearDown() throws Exception {
if (client != null) {
client.close();
}
}
protected void println(SearchResponse searchResponse) {
Utils.println(searchResponse);
}
}
本实例代码已经上传到 Git ElasticsearchClient.java
所有实例 已经上传到Git
XPackTransportClient
如果 ElasticSearch
服务安装了 x-pack
插件,需要PreBuiltXPackTransportClient
实例才能访问
使用Maven管理项目,把下面代码增加到pom.xml
;
一定要修改默认仓库地址为https://artifacts.elastic.co/maven ,因为这个库没有上传到Maven中央仓库,如果有自己的 maven ,请配置代理
<project ...>
<repositories>
<!-- add the elasticsearch repo -->
<repository>
<id>elasticsearch-releases</id>
<url>https://artifacts.elastic.co/maven</url>
<releases>
<enabled>true</enabled>
</releases>
<snapshots>
<enabled>false</enabled>
</snapshots>
</repository>
...
</repositories>
...
<dependencies>
<!-- add the x-pack jar as a dependency -->
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>x-pack-transport</artifactId>
<version>5.6.3</version>
</dependency>
...
</dependencies>
...
</project>
实例
/**
* Elasticsearch XPack Client
* Created by http://quanke.name on 2017/11/10.
*/
public class ElasticsearchXPackClient {
protected TransportClient client;
@Before
public void setUp() throws Exception {
/**
* 如果es集群安装了x-pack插件则以此种方式连接集群
* 1. java客户端的方式是以tcp协议在9300端口上进行通信
* 2. http客户端的方式是以http协议在9200端口上进行通信
*/
Settings settings = Settings.builder()
.put("xpack.security.user", "elastic:utan100")
.put("cluster.name", "utan-es")
.build();
client = new PreBuiltXPackTransportClient(settings)
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("192.168.1.10"), 9300));
// final CredentialsProvider credentialsProvider = new BasicCredentialsProvider();
// credentialsProvider.setCredentials(AuthScope.ANY,
// new UsernamePasswordCredentials("elastic", "utan100"));
System.out.println("ElasticsearchXPackClient 启动成功");
}
@Test
public void testClientConnection() throws Exception {
System.out.println("--------------------------");
}
@After
public void tearDown() throws Exception {
if (client != null) {
client.close();
}
}
protected void println(SearchResponse searchResponse) {
Utils.println(searchResponse);
}
}
本实例代码已经上传到 Git ElasticsearchXPackClient.java
所有实例 已经上传到Git
更多请浏览 spring-boot-starter-es 开源项目
收起阅读 »如何有任何问题请关注微信公众号给我留言
Elastic Stack 全新推出 6.0.0
https://www.elastic.co/cn/blog/elastic-stack-6-0-0-released
全新推出 6.0.0。
无需多说。你应该立即下载试用,或者通过你最喜欢的托管式 Elasticsearch 和 Kibana 提供平台 Elastic Cloud 亲身体验。
如果你在过去几个月没有跟上我们的发布节奏,可能会对今天的公告感到意外。今天标志着成千上万的 pull 请求和成百上千位代码提交者的努力终见成效。期间共有两个 alpha 版本、两个 beta 版本、两个候选版本以及最终的通用版本 (GA)。这个里程碑离不开 Elastic 各路团队的努力。还要感谢参与先锋计划的用户提出的意见和反馈。
今天,我们不仅发布了整套 Elastic Stack,还发布了 Elastic Cloud Enterprise 1.1,其中包括 6.0 支持、离线安装,并且对用户体验进行了一系列改进,旨在简化集群的配置、管理和监控。同天发布多款产品的正式版本还不够……还有仍是 Alpha 版本的 APM ,我们邀请大家在 6.0.0 中对它进行测试。
一个版本有如此多的亮点,该从哪里说起呢?你们撰文细述也好,提供详情链接也好,祝你们有愉快的阅读体验……更重要的是……祝你们有愉快的搜索、分析和可视化体验。
Elasticsearch
全新零停机升级体验,增加了序列 ID、改进了对稀疏数据的处理、加快了查询速度、分布式执行 watch 等等。功能摘要请查看详情。
Kibana
支持 “Dashboard Only” 模式,支持 “全屏” 模式,能够将保存的搜索结果导出到 .csv,X-Pack 黄金版及以上版本支持通过 UI 创建告警,X-Pack 基础版提供迁移助手,我们还通过调整对比度、支持快捷键来产品易用性,让用户使用起来更方便。数据交互的未来详见此贴。
Logstash
单一 Logstash 实例中可存在多个自成体系的管道,另有新增 UI 组件 - X-Pack 基础版中的管道查看器,以及 X-Pack 黄金版中的 Logstash 管道管理。了解详情,点这里。
Beats
Beats <3 容器以及 Beats <3 模块(并且改进了适用于这些模块的仪表板)。再结合全新命令和配置布局,在 Metricbeat 实现更高效的存储。此外,全新推出 Auditbeat。细节详见这里。
ES-Hadoop
对Spark的结构化数据流的一流支持已经降落到了 6.0,并重新编写了连接器映射代码以更好地支持多个映射。支持读写新的连接字段也被添加了。用户现在也可以利用非内联脚本类型的更新操作。详细信息。
立即获取!
https://www.elastic.co/cn/blog/elastic-stack-6-0-0-released
全新推出 6.0.0。
无需多说。你应该立即下载试用,或者通过你最喜欢的托管式 Elasticsearch 和 Kibana 提供平台 Elastic Cloud 亲身体验。
如果你在过去几个月没有跟上我们的发布节奏,可能会对今天的公告感到意外。今天标志着成千上万的 pull 请求和成百上千位代码提交者的努力终见成效。期间共有两个 alpha 版本、两个 beta 版本、两个候选版本以及最终的通用版本 (GA)。这个里程碑离不开 Elastic 各路团队的努力。还要感谢参与先锋计划的用户提出的意见和反馈。
今天,我们不仅发布了整套 Elastic Stack,还发布了 Elastic Cloud Enterprise 1.1,其中包括 6.0 支持、离线安装,并且对用户体验进行了一系列改进,旨在简化集群的配置、管理和监控。同天发布多款产品的正式版本还不够……还有仍是 Alpha 版本的 APM ,我们邀请大家在 6.0.0 中对它进行测试。
一个版本有如此多的亮点,该从哪里说起呢?你们撰文细述也好,提供详情链接也好,祝你们有愉快的阅读体验……更重要的是……祝你们有愉快的搜索、分析和可视化体验。
Elasticsearch
全新零停机升级体验,增加了序列 ID、改进了对稀疏数据的处理、加快了查询速度、分布式执行 watch 等等。功能摘要请查看详情。
Kibana
支持 “Dashboard Only” 模式,支持 “全屏” 模式,能够将保存的搜索结果导出到 .csv,X-Pack 黄金版及以上版本支持通过 UI 创建告警,X-Pack 基础版提供迁移助手,我们还通过调整对比度、支持快捷键来产品易用性,让用户使用起来更方便。数据交互的未来详见此贴。
Logstash
单一 Logstash 实例中可存在多个自成体系的管道,另有新增 UI 组件 - X-Pack 基础版中的管道查看器,以及 X-Pack 黄金版中的 Logstash 管道管理。了解详情,点这里。
Beats
Beats <3 容器以及 Beats <3 模块(并且改进了适用于这些模块的仪表板)。再结合全新命令和配置布局,在 Metricbeat 实现更高效的存储。此外,全新推出 Auditbeat。细节详见这里。
ES-Hadoop
对Spark的结构化数据流的一流支持已经降落到了 6.0,并重新编写了连接器映射代码以更好地支持多个映射。支持读写新的连接字段也被添加了。用户现在也可以利用非内联脚本类型的更新操作。详细信息。
立即获取!
收起阅读 »上海普翔招聘 Elastic技术支持工程师
招聘职位:Elastic技术支持工程师
工作地点:上海
薪资待遇:18k ~ 25k
公司介绍:上海普翔是 elastic 目前在中国的合作伙伴,负责 X-Pack、ECE 产品销售以及技术咨询等 elastic相关的业务,详情可以查看 http://elastictech.cn 。
工作内容:
1、与 Elastic 公司一起挖掘国内的付费用户,拜访客户并介绍 X-Pack 、ECE 等产品。
2、参与国内付费用户的 elastic 产品实践,帮助他们解决实际中遇到的问题。
3、参与国内 elastic 产品的推广工作,比如录制教学视频、直播等。
职位要求:
1、本科以上学历,3年以上工作经验。
2、熟悉 Elastic 产品(如 Elasticsearch、Kibana、Logstash、Beats )的使用方法,了解常见优化方案,可以解决社区和客户中用户遇到的问题。
3、需要极强的学习和研究能力,面对一个新产品或者特性时,可以在极短的时间内掌握并通过自己的语言给客户讲解和演示。
4、有良好的沟通和表达能力,擅长倾听客户的问题并快速定位解决问题的关键点。
我们是 elastic 的官方合作伙伴,所以会有很多最新的信息与资料,对 elastic 在中国的发展有信心的同学不要错过机会哦!
欢迎投递简历至:weibinway@elastictech.cn
招聘职位:Elastic技术支持工程师
工作地点:上海
薪资待遇:18k ~ 25k
公司介绍:上海普翔是 elastic 目前在中国的合作伙伴,负责 X-Pack、ECE 产品销售以及技术咨询等 elastic相关的业务,详情可以查看 http://elastictech.cn 。
工作内容:
1、与 Elastic 公司一起挖掘国内的付费用户,拜访客户并介绍 X-Pack 、ECE 等产品。
2、参与国内付费用户的 elastic 产品实践,帮助他们解决实际中遇到的问题。
3、参与国内 elastic 产品的推广工作,比如录制教学视频、直播等。
职位要求:
1、本科以上学历,3年以上工作经验。
2、熟悉 Elastic 产品(如 Elasticsearch、Kibana、Logstash、Beats )的使用方法,了解常见优化方案,可以解决社区和客户中用户遇到的问题。
3、需要极强的学习和研究能力,面对一个新产品或者特性时,可以在极短的时间内掌握并通过自己的语言给客户讲解和演示。
4、有良好的沟通和表达能力,擅长倾听客户的问题并快速定位解决问题的关键点。
我们是 elastic 的官方合作伙伴,所以会有很多最新的信息与资料,对 elastic 在中国的发展有信心的同学不要错过机会哦!
欢迎投递简历至:weibinway@elastictech.cn
收起阅读 »
社区日报 第101期 (2017-11-15)
https://www.elastic.co/blog/el ... eased
2、vts数据处理搜索引擎选型
http://t.cn/RjfQx01
3、如何监控、调优golang应用程序且看下文分晓
http://t.cn/RlA4biz
4、elasticsearch性能分析123
http://t.cn/RjfEjN3
5、只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:wt
归档:https://elasticsearch.cn/article/377
订阅:https://tinyletter.com/elastic-daily
https://www.elastic.co/blog/el ... eased
2、vts数据处理搜索引擎选型
http://t.cn/RjfQx01
3、如何监控、调优golang应用程序且看下文分晓
http://t.cn/RlA4biz
4、elasticsearch性能分析123
http://t.cn/RjfEjN3
5、只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:wt
归档:https://elasticsearch.cn/article/377
订阅:https://tinyletter.com/elastic-daily
收起阅读 »
spark elasticsearch 异常: PowerSet: Too many elements to create a power set 40
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Too many elements to create a power set 54
at org.elasticsearch.hadoop.util.Assert.isTrue(Assert.java:50)
at org.elasticsearch.hadoop.rest.ShardSorter$PowerSet.(ShardSorter.java:218)
at org.elasticsearch.hadoop.rest.ShardSorter.powerList(ShardSorter.java:202)
at org.elasticsearch.hadoop.rest.ShardSorter.checkCombo(ShardSorter.java:89)
at org.elasticsearch.hadoop.rest.ShardSorter.find(ShardSorter.java:85)
at org.elasticsearch.hadoop.rest.RestRepository.doGetReadTargetShards(RestRepository.java:352)
at org.elasticsearch.hadoop.rest.RestRepository.getReadTargetShards(RestRepository.java:295)
at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:253)
at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions$lzycompute(AbstractEsRDD.scala:61)
at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions(AbstractEsRDD.scala:60)
at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:27)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)Users with the same issue
解释:
Read failure when index/alias spread among 32 or more nodes
https://github.com/elastic/ela ... s/737
org.elasticsearch.hadoop.EsHadoopIllegalArgumentException: Too many elements to create a power set 54
at org.elasticsearch.hadoop.util.Assert.isTrue(Assert.java:50)
at org.elasticsearch.hadoop.rest.ShardSorter$PowerSet.(ShardSorter.java:218)
at org.elasticsearch.hadoop.rest.ShardSorter.powerList(ShardSorter.java:202)
at org.elasticsearch.hadoop.rest.ShardSorter.checkCombo(ShardSorter.java:89)
at org.elasticsearch.hadoop.rest.ShardSorter.find(ShardSorter.java:85)
at org.elasticsearch.hadoop.rest.RestRepository.doGetReadTargetShards(RestRepository.java:352)
at org.elasticsearch.hadoop.rest.RestRepository.getReadTargetShards(RestRepository.java:295)
at org.elasticsearch.hadoop.rest.RestService.findPartitions(RestService.java:253)
at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions$lzycompute(AbstractEsRDD.scala:61)
at org.elasticsearch.spark.rdd.AbstractEsRDD.esPartitions(AbstractEsRDD.scala:60)
at org.elasticsearch.spark.rdd.AbstractEsRDD.getPartitions(AbstractEsRDD.scala:27)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)Users with the same issue
解释:
Read failure when index/alias spread among 32 or more nodes
https://github.com/elastic/ela ... s/737 收起阅读 »
写在 社区 日报100期时 —— 相信社区的力量
我一直有看湾区日报的习惯,后来看到 Golang 中国社区的 astaxie 也在做 GoCN 每日新闻 的事情。日报对社区来讲是一件持续输出的事情,对个人来讲是一件持续输入的事情,于是我决定在 Elastic 社区也做这么一件事情,在2017年7月30日我发布了 Elastic日报 第1期。
当天和 medcl 聊过后,他建议发动社区的力量来做,这样才能保证日报做好做久。于是我们开始在社区里面招募日报编辑,很快便有很多同学响应,接着 Elastic日报编辑部
成立。到今天,我们一共有8位社区编辑,其中7位负责每周固定一天的日报,另一位负责审稿和公众号文章发布。他们分别是:
- 江水
- 金桥
- bsll
- 至尊宝
- 叮咚光军
- laoyang360
- cyberdak
- 陶文
感谢社区编辑们的付出,我们一同做了一件了不起的事情——持续100天的知识输出。如果有同学把这100天的日报内容都看完吃透,那它的Elastic 技术水准肯定提升了不止1个档次。
现在想来,如果是我一个人做这件事情,恐怕日报不会超过30期。个人的力量是有限的,而社区的力量是无限的。每天看到编辑们精挑细选的文章,我都会诧异 Elastic 相关的优秀文章可真是多啊!
相信社区的力量,让我们期待Elastic日报200期、300期甚至1000期的到来!
我一直有看湾区日报的习惯,后来看到 Golang 中国社区的 astaxie 也在做 GoCN 每日新闻 的事情。日报对社区来讲是一件持续输出的事情,对个人来讲是一件持续输入的事情,于是我决定在 Elastic 社区也做这么一件事情,在2017年7月30日我发布了 Elastic日报 第1期。
当天和 medcl 聊过后,他建议发动社区的力量来做,这样才能保证日报做好做久。于是我们开始在社区里面招募日报编辑,很快便有很多同学响应,接着 Elastic日报编辑部
成立。到今天,我们一共有8位社区编辑,其中7位负责每周固定一天的日报,另一位负责审稿和公众号文章发布。他们分别是:
- 江水
- 金桥
- bsll
- 至尊宝
- 叮咚光军
- laoyang360
- cyberdak
- 陶文
感谢社区编辑们的付出,我们一同做了一件了不起的事情——持续100天的知识输出。如果有同学把这100天的日报内容都看完吃透,那它的Elastic 技术水准肯定提升了不止1个档次。
现在想来,如果是我一个人做这件事情,恐怕日报不会超过30期。个人的力量是有限的,而社区的力量是无限的。每天看到编辑们精挑细选的文章,我都会诧异 Elastic 相关的优秀文章可真是多啊!
相信社区的力量,让我们期待Elastic日报200期、300期甚至1000期的到来!
收起阅读 »社区日报 第100期 (2017-11-14)
http://t.cn/RjyNLge
2.Elasticsearch选主流程详细分析。
http://t.cn/RjyNPLT
3.手把手教你如何使用ES提高WordPress的搜索速度。
http://t.cn/RjbD1QK
4.只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:叮咚光军
归档:https://elasticsearch.cn/article/374
订阅:https://tinyletter.com/elastic-daily
http://t.cn/RjyNLge
2.Elasticsearch选主流程详细分析。
http://t.cn/RjyNPLT
3.手把手教你如何使用ES提高WordPress的搜索速度。
http://t.cn/RjbD1QK
4.只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:叮咚光军
归档:https://elasticsearch.cn/article/374
订阅:https://tinyletter.com/elastic-daily 收起阅读 »
从es2.3到5.6的迁移实践
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/e ... .html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: es5_dev
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
node.name: es5-node03
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: ["127.0.0.1","10.204.12.33"]
http.port: 9201
transport.tcp.port: 9301
#http.host: 127.0.0.1
#http.enabled: false
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts:
- 10.204.12.31:9301
- 10.204.12.32:9301
- 10.204.12.33:9301
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
indices.requests.cache.size: 5%
config/jvm.options
## JVM configuration
################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/e ... .html
## for more information
##
################################################################
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms2g
-Xmx2g
################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################
## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
## optimizations
# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch
## basic
# force the server VM (remove on 32-bit client JVMs)
-server
# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m
# set to headless, just in case
-Djava.awt.headless=true
# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8
# use our provided JNA always versus the system one
-Djna.nosys=true
# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true
# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}
## GC logging
#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}
# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M
# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true
安装ik分词器
bin/elasticsearch-plugin install https://github.com/medcl/elast ... 1.zip
./bin/elasticsearch-plugin install https://github.com/medcl/elast ... 3.zip
配置ik远程扩展词典用于热词更新 elasticsearch-5.6.3/config/analysis-ik/IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">http://distribute.search.leju. ... gt%3B
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
安装拼音分词器
cd elasticsearch-5.5.1/plugins
wget https://github.com/medcl/elast ... 5.5.1
unzip v5.5.1
打包部署其他节点时,先清理data目录
集群监控可以利用head的chrome插件
数据迁移
迁移工具是自己写的elasticbak,目前更新了5.6.3驱动。github链接:https://github.com/jiashiwen/elasticbak。
数据备份
java -jar elasticbak-2.3.3.jar \
--exp \
--cluster lejuesdev \
--host 10.204.12.31 \
--filesize 1000 \
--backupdir ./esbackupset \
--backupindexes "*" \
--threads 4
由于版本field的变化需要手工重建索引,这里举个例子,主要是2.x版本的string需要改为text。2.x版本我们通过index参数指定该字段是否被索引("index": "no")以及是否通过分词器分词("index": "not_analyzed")。在5.X版本里index只用来制定是否创建索引,如果需要整个字段不过分词器创建索引,需要通过keyword字段完成。
curl -XPUT "http://10.204.12.31:9201/house_geo" -H 'Content-Type: application/json' -d'
{
"mappings": {
"house": {
"dynamic": "strict",
"_all": {
"enabled": false
},
"properties": {
"_category": {
"type": "keyword",
"store": true
},
"_content": {
"type": "text",
"store": true,
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_deleted": {
"type": "boolean",
"store": true
},
"_doccreatetime": {
"type": "date",
"store": true,
"format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
},
"_docupdatetime": {
"type": "date",
"store": true,
"format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
},
"_flags": {
"type": "text",
"store": true,
"analyzer": "whitespace"
},
"_hits": {
"type": "text"
},
"_location": {
"type": "geo_point"
},
"_multi": {
"properties": {
"_location": {
"type": "geo_point"
}
}
},
"_origin": {
"type": "object",
"enabled": false
},
"_scope": {
"type": "keyword",
"store": true
},
"_tags": {
"type": "text",
"boost": 10,
"store": true,
"term_vector": "with_positions_offsets",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_title": {
"type": "text",
"store": true,
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_uniqid": {
"type": "keyword",
"store": true
},
"_uniqsign": {
"type": "keyword",
"store": true
},
"_url": {
"type": "text",
"index": false,
"store": true
},
"location": {
"type": "geo_point"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "3",
"requests": {
"cache": {
"enable": "true"
}
},
"analysis": {
"filter": {
"my_synonym": {
"type": "synonym",
"synonyms_path": "analysis-ik/custom/synonym.dic"
}
},
"analyzer": {
"searchanalyzer": {
"filter": "my_synonym",
"type": "custom",
"tokenizer": "ik_smart"
},
"indexanalyzer": {
"filter": "my_synonym",
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "1"
}
}
}'
利用新版elasticbak导入索引数据
java -jar elasticbak-5.6.3.jar \
--imp \
--cluster es5_dev \
--host 10.204.12.31 \
--port 9301 \
--restoreindex house_geo \
--restoretype dataonly \
--backupset esbackupset/house_geo \
--threads 4
# ======================== Elasticsearch Configuration =========================
#
# NOTE: Elasticsearch comes with reasonable defaults for most settings.
# Before you set out to tweak and tune the configuration, make sure you
# understand what are you trying to accomplish and the consequences.
#
# The primary way of configuring a node is via this file. This template lists
# the most important settings you may want to configure for a production cluster.
#
# Please consult the documentation for further information on configuration options:
# https://www.elastic.co/guide/e ... .html
#
# ---------------------------------- Cluster -----------------------------------
#
# Use a descriptive name for your cluster:
#
#cluster.name: my-application
cluster.name: es5_dev
#
# ------------------------------------ Node ------------------------------------
#
# Use a descriptive name for the node:
#
#node.name: node-1
node.name: es5-node03
#
# Add custom attributes to the node:
#
#node.attr.rack: r1
#
# ----------------------------------- Paths ------------------------------------
#
# Path to directory where to store the data (separate multiple locations by comma):
#
#path.data: /path/to/data
#
# Path to log files:
#
#path.logs: /path/to/logs
#
# ----------------------------------- Memory -----------------------------------
#
# Lock the memory on startup:
#
#bootstrap.memory_lock: true
bootstrap.memory_lock: true
#
# Make sure that the heap size is set to about half the memory available
# on the system and that the owner of the process is allowed to use this
# limit.
#
# Elasticsearch performs poorly when the system is swapping the memory.
#
# ---------------------------------- Network -----------------------------------
#
# Set the bind address to a specific IP (IPv4 or IPv6):
#
#network.host: 192.168.0.1
network.host: ["127.0.0.1","10.204.12.33"]
http.port: 9201
transport.tcp.port: 9301
#http.host: 127.0.0.1
#http.enabled: false
#
# Set a custom port for HTTP:
#
#http.port: 9200
#
# For more information, consult the network module documentation.
#
# --------------------------------- Discovery ----------------------------------
#
# Pass an initial list of hosts to perform discovery when new node is started:
# The default list of hosts is ["127.0.0.1", "[::1]"]
#
#discovery.zen.ping.unicast.hosts: ["host1", "host2"]
node.master: true
node.data: true
discovery.zen.minimum_master_nodes: 1
discovery.zen.ping.unicast.hosts:
- 10.204.12.31:9301
- 10.204.12.32:9301
- 10.204.12.33:9301
#
# Prevent the "split brain" by configuring the majority of nodes (total number of master-eligible nodes / 2 + 1):
#
#discovery.zen.minimum_master_nodes: 3
#
# For more information, consult the zen discovery module documentation.
#
# ---------------------------------- Gateway -----------------------------------
#
# Block initial recovery after a full cluster restart until N nodes are started:
#
#gateway.recover_after_nodes: 3
#
# For more information, consult the gateway module documentation.
#
# ---------------------------------- Various -----------------------------------
#
# Require explicit names when deleting indices:
#
#action.destructive_requires_name: true
indices.requests.cache.size: 5%
config/jvm.options
## JVM configuration
################################################################
## IMPORTANT: JVM heap size
################################################################
##
## You should always set the min and max JVM heap
## size to the same value. For example, to set
## the heap to 4 GB, set:
##
## -Xms4g
## -Xmx4g
##
## See https://www.elastic.co/guide/e ... .html
## for more information
##
################################################################
# Xms represents the initial size of total heap space
# Xmx represents the maximum size of total heap space
-Xms2g
-Xmx2g
################################################################
## Expert settings
################################################################
##
## All settings below this section are considered
## expert settings. Don't tamper with them unless
## you understand what you are doing
##
################################################################
## GC configuration
-XX:+UseConcMarkSweepGC
-XX:CMSInitiatingOccupancyFraction=75
-XX:+UseCMSInitiatingOccupancyOnly
## optimizations
# pre-touch memory pages used by the JVM during initialization
-XX:+AlwaysPreTouch
## basic
# force the server VM (remove on 32-bit client JVMs)
-server
# explicitly set the stack size (reduce to 320k on 32-bit client JVMs)
-Xss1m
# set to headless, just in case
-Djava.awt.headless=true
# ensure UTF-8 encoding by default (e.g. filenames)
-Dfile.encoding=UTF-8
# use our provided JNA always versus the system one
-Djna.nosys=true
# use old-style file permissions on JDK9
-Djdk.io.permissionsUseCanonicalPath=true
# flags to configure Netty
-Dio.netty.noUnsafe=true
-Dio.netty.noKeySetOptimization=true
-Dio.netty.recycler.maxCapacityPerThread=0
# log4j 2
-Dlog4j.shutdownHookEnabled=false
-Dlog4j2.disable.jmx=true
-Dlog4j.skipJansi=true
## heap dumps
# generate a heap dump when an allocation from the Java heap fails
# heap dumps are created in the working directory of the JVM
-XX:+HeapDumpOnOutOfMemoryError
# specify an alternative path for heap dumps
# ensure the directory exists and has sufficient space
#-XX:HeapDumpPath=${heap.dump.path}
## GC logging
#-XX:+PrintGCDetails
#-XX:+PrintGCTimeStamps
#-XX:+PrintGCDateStamps
#-XX:+PrintClassHistogram
#-XX:+PrintTenuringDistribution
#-XX:+PrintGCApplicationStoppedTime
# log GC status to a file with time stamps
# ensure the directory exists
#-Xloggc:${loggc}
# By default, the GC log file will not rotate.
# By uncommenting the lines below, the GC log file
# will be rotated every 128MB at most 32 times.
#-XX:+UseGCLogFileRotation
#-XX:NumberOfGCLogFiles=32
#-XX:GCLogFileSize=128M
# Elasticsearch 5.0.0 will throw an exception on unquoted field names in JSON.
# If documents were already indexed with unquoted fields in a previous version
# of Elasticsearch, some operations may throw errors.
#
# WARNING: This option will be removed in Elasticsearch 6.0.0 and is provided
# only for migration purposes.
#-Delasticsearch.json.allow_unquoted_field_names=true
安装ik分词器
bin/elasticsearch-plugin install https://github.com/medcl/elast ... 1.zip
./bin/elasticsearch-plugin install https://github.com/medcl/elast ... 3.zip
配置ik远程扩展词典用于热词更新 elasticsearch-5.6.3/config/analysis-ik/IKAnalyzer.cfg.xml
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
<comment>IK Analyzer 扩展配置</comment>
<!--用户可以在这里配置自己的扩展字典 -->
<entry key="ext_dict"></entry>
<!--用户可以在这里配置自己的扩展停止词字典-->
<entry key="ext_stopwords"></entry>
<!--用户可以在这里配置远程扩展字典 -->
<entry key="remote_ext_dict">http://distribute.search.leju. ... gt%3B
<!--用户可以在这里配置远程扩展停止词字典-->
<!-- <entry key="remote_ext_stopwords">words_location</entry> -->
</properties>
安装拼音分词器
cd elasticsearch-5.5.1/plugins
wget https://github.com/medcl/elast ... 5.5.1
unzip v5.5.1
打包部署其他节点时,先清理data目录
集群监控可以利用head的chrome插件
数据迁移
迁移工具是自己写的elasticbak,目前更新了5.6.3驱动。github链接:https://github.com/jiashiwen/elasticbak。
数据备份
java -jar elasticbak-2.3.3.jar \
--exp \
--cluster lejuesdev \
--host 10.204.12.31 \
--filesize 1000 \
--backupdir ./esbackupset \
--backupindexes "*" \
--threads 4
由于版本field的变化需要手工重建索引,这里举个例子,主要是2.x版本的string需要改为text。2.x版本我们通过index参数指定该字段是否被索引("index": "no")以及是否通过分词器分词("index": "not_analyzed")。在5.X版本里index只用来制定是否创建索引,如果需要整个字段不过分词器创建索引,需要通过keyword字段完成。
curl -XPUT "http://10.204.12.31:9201/house_geo" -H 'Content-Type: application/json' -d'
{
"mappings": {
"house": {
"dynamic": "strict",
"_all": {
"enabled": false
},
"properties": {
"_category": {
"type": "keyword",
"store": true
},
"_content": {
"type": "text",
"store": true,
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_deleted": {
"type": "boolean",
"store": true
},
"_doccreatetime": {
"type": "date",
"store": true,
"format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
},
"_docupdatetime": {
"type": "date",
"store": true,
"format": "strict_date_optional_time||epoch_millis||yyyy/MM/dd HH:mm:ss||yyyy/MM/dd"
},
"_flags": {
"type": "text",
"store": true,
"analyzer": "whitespace"
},
"_hits": {
"type": "text"
},
"_location": {
"type": "geo_point"
},
"_multi": {
"properties": {
"_location": {
"type": "geo_point"
}
}
},
"_origin": {
"type": "object",
"enabled": false
},
"_scope": {
"type": "keyword",
"store": true
},
"_tags": {
"type": "text",
"boost": 10,
"store": true,
"term_vector": "with_positions_offsets",
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_title": {
"type": "text",
"store": true,
"analyzer": "ik_max_word",
"search_analyzer": "ik_smart"
},
"_uniqid": {
"type": "keyword",
"store": true
},
"_uniqsign": {
"type": "keyword",
"store": true
},
"_url": {
"type": "text",
"index": false,
"store": true
},
"location": {
"type": "geo_point"
}
}
}
},
"settings": {
"index": {
"number_of_shards": "3",
"requests": {
"cache": {
"enable": "true"
}
},
"analysis": {
"filter": {
"my_synonym": {
"type": "synonym",
"synonyms_path": "analysis-ik/custom/synonym.dic"
}
},
"analyzer": {
"searchanalyzer": {
"filter": "my_synonym",
"type": "custom",
"tokenizer": "ik_smart"
},
"indexanalyzer": {
"filter": "my_synonym",
"type": "custom",
"tokenizer": "ik_max_word"
}
}
},
"number_of_replicas": "1"
}
}
}'
利用新版elasticbak导入索引数据
java -jar elasticbak-5.6.3.jar \
--imp \
--cluster es5_dev \
--host 10.204.12.31 \
--port 9301 \
--restoreindex house_geo \
--restoretype dataonly \
--backupset esbackupset/house_geo \
--threads 4
收起阅读 »
社区日报 第99期 (2017-11-13)
http://t.cn/Rj2uLh9
2、logstash配置文件的vscode插件,从其编辑配置文件不再发愁。
http://t.cn/Rj21ncE
3、elk告警插件sentinl。随着版本的更新,目前已经可以媲美x-pack的reporter以及watcher。
http://t.cn/Rj216Ef
4、只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:cyberdak
归档:https://elasticsearch.cn/article/372
订阅:https://tinyletter.com/elastic-daily
http://t.cn/Rj2uLh9
2、logstash配置文件的vscode插件,从其编辑配置文件不再发愁。
http://t.cn/Rj21ncE
3、elk告警插件sentinl。随着版本的更新,目前已经可以媲美x-pack的reporter以及watcher。
http://t.cn/Rj216Ef
4、只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:cyberdak
归档:https://elasticsearch.cn/article/372
订阅:https://tinyletter.com/elastic-daily 收起阅读 »
社区日报 第98期 (2017-11-12)
http://t.cn/RjPvlq1
2. 将 ELASTICSEARCH 写入速度优化到极限
http://t.cn/RWs8yvS
3. 零点之战!探访阿里巴巴8大技术专家,提前揭秘2017双11关键技术。
http://t.cn/RjPPzGc
4. 只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:至尊宝
归档:https://elasticsearch.cn/article/371
订阅:https://tinyletter.com/elastic-daily
http://t.cn/RjPvlq1
2. 将 ELASTICSEARCH 写入速度优化到极限
http://t.cn/RWs8yvS
3. 零点之战!探访阿里巴巴8大技术专家,提前揭秘2017双11关键技术。
http://t.cn/RjPPzGc
4. 只等你来 | Elastic Meetup 广州交流会
https://elasticsearch.cn/article/364
编辑:至尊宝
归档:https://elasticsearch.cn/article/371
订阅:https://tinyletter.com/elastic-daily 收起阅读 »
三步上手 esrally 完成 elasticsearch 压测任务
距离上一篇 esrally 教程过去快2个月了,这期间不停有同学来询问使用中遇到的问题,尤其由于其测试数据存储在国外 aws 上,导致下载极慢。为了让大家快速上手使用 esrally,我 build 了一个可用的 docker 镜像,然后将 13GB
的测试数据拉取到国内的存储上,通过百度网盘的方式分享给大家。大家只要按照下面简单的几步操作就可以顺畅地使用 esrally 来进行相关测试了。
操作步骤
废话不多说,先上菜!
- 拉取镜像
docker pull rockybean/esrally
- 下载数据文件 链接:http://pan.baidu.com/s/1eSrjZgA 密码:aagl
- 进入下载后的文件夹 rally_track,执行如下命令开始测试
docker run -it -v $(PWD):/root/track rockybean/esrally esrally race --track-path=/root/track/logging --offline --pipeline=benchmark-only --target-hosts=192.168.1.105:9200
打完收工!
几点说明
数据文件介绍
esrally 自带的测试数据即为 rally_track 文件夹中的内容,主要包括:
- Geonames(geonames): for evaluating the performance of structured data.
- Geopoint(geopoint): for evaluating the performance of geo queries.
- Percolator(percolator): for evaluating the performance of percolation queries.
- PMC(pmc): for evaluating the performance of full text search.
- NYC taxis(nyc_taxis): for evaluating the performance for highly structured data.
- Nested(nested): for evaluating the performance for nested documents.
- Logging(logging): for evaluating the performance of (Web) server logs.
- noaa(noaa): for evaluating the performance of range fields.
可以根据自己的需要下载对应的测试数据,不必下载全部,保证对应文件夹下载完全即可。
命令解释
docker 相关
docker run -it rockybean/esrally esrally
为执行的 esrally 命令,-v $(PWD):/root/track
是将 rally_docker 文件夹映射到 docker 容器中,$(PWD)
是获取当前目录的意思,所以在此前要 cd 到 rally_docker 目录,当然你写全路径也是没有问题的。
esrally 的 docker 镜像比较简单,可以参看 github 项目介绍。
esrally 相关
该镜像是通过自定义 track 的方式来加载数据,所以命令行中用到 --track=/root/track/logging
的命令行参数。注意这里的 /root/track
即上面我们绑定到容器的目录,更换 logging
为其他的数据集名称即可加载其他的测试数据。
该容器只支持测试第三方 es 集群,即 --pipeline=benchmark-only
模式。这应该也是最常见的压测需求了。
愉快地去玩耍吧!
距离上一篇 esrally 教程过去快2个月了,这期间不停有同学来询问使用中遇到的问题,尤其由于其测试数据存储在国外 aws 上,导致下载极慢。为了让大家快速上手使用 esrally,我 build 了一个可用的 docker 镜像,然后将 13GB
的测试数据拉取到国内的存储上,通过百度网盘的方式分享给大家。大家只要按照下面简单的几步操作就可以顺畅地使用 esrally 来进行相关测试了。
操作步骤
废话不多说,先上菜!
- 拉取镜像
docker pull rockybean/esrally
- 下载数据文件 链接:http://pan.baidu.com/s/1eSrjZgA 密码:aagl
- 进入下载后的文件夹 rally_track,执行如下命令开始测试
docker run -it -v $(PWD):/root/track rockybean/esrally esrally race --track-path=/root/track/logging --offline --pipeline=benchmark-only --target-hosts=192.168.1.105:9200
打完收工!
几点说明
数据文件介绍
esrally 自带的测试数据即为 rally_track 文件夹中的内容,主要包括:
- Geonames(geonames): for evaluating the performance of structured data.
- Geopoint(geopoint): for evaluating the performance of geo queries.
- Percolator(percolator): for evaluating the performance of percolation queries.
- PMC(pmc): for evaluating the performance of full text search.
- NYC taxis(nyc_taxis): for evaluating the performance for highly structured data.
- Nested(nested): for evaluating the performance for nested documents.
- Logging(logging): for evaluating the performance of (Web) server logs.
- noaa(noaa): for evaluating the performance of range fields.
可以根据自己的需要下载对应的测试数据,不必下载全部,保证对应文件夹下载完全即可。
命令解释
docker 相关
docker run -it rockybean/esrally esrally
为执行的 esrally 命令,-v $(PWD):/root/track
是将 rally_docker 文件夹映射到 docker 容器中,$(PWD)
是获取当前目录的意思,所以在此前要 cd 到 rally_docker 目录,当然你写全路径也是没有问题的。
esrally 的 docker 镜像比较简单,可以参看 github 项目介绍。
esrally 相关
该镜像是通过自定义 track 的方式来加载数据,所以命令行中用到 --track=/root/track/logging
的命令行参数。注意这里的 /root/track
即上面我们绑定到容器的目录,更换 logging
为其他的数据集名称即可加载其他的测试数据。
该容器只支持测试第三方 es 集群,即 --pipeline=benchmark-only
模式。这应该也是最常见的压测需求了。
愉快地去玩耍吧!
收起阅读 »sense不能用了改用kibana吧
一、elasticsearch5.5.2+kibana5.5.2
1.下载与elasticsearch版本号一致的kibana安装包,笔者目前开发环境5.5.2,对应kibana版本也为5.5.2(最新的5.6版本会报不兼容错误,不能运行)。
2.配置config/kibana.yml文件,主要配置项如下
# The URL of the Elasticsearch instance to use for all your queries.
#elasticsearch.url: "http://localhost:9200"
elasticsearch.url: "https://192.168.1.1:9281/"
# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "user"
#elasticsearch.password: "pass"
elasticsearch.username: "admin"
elasticsearch.password: "admin"
# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key
elasticsearch.ssl.certificate: /home/develop/kibana-5.6.3-linux-x86_64/config/crts/eshttp.crt
elasticsearch.ssl.key: /home/develop/kibana-5.6.3-linux-x86_64/config/crts/eshttp.key
# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full
elasticsearch.ssl.verificationMode: none各项配置看文件内说明,写的很清楚,这里就不翻译了,其中最重要的是这两样elasticsearch.ssl.certificate和elasticsearch.ssl.key,一定要与服务端保持一致。由于证书是自己生成的,校验项elasticsearch.ssl.verificationMode的值需要改为none。
启动kibana后,通过http://localhose:5601访问即可
一、elasticsearch5.5.2+kibana5.5.2
1.下载与elasticsearch版本号一致的kibana安装包,笔者目前开发环境5.5.2,对应kibana版本也为5.5.2(最新的5.6版本会报不兼容错误,不能运行)。
2.配置config/kibana.yml文件,主要配置项如下
# The URL of the Elasticsearch instance to use for all your queries.
#elasticsearch.url: "http://localhost:9200"
elasticsearch.url: "https://192.168.1.1:9281/"
# If your Elasticsearch is protected with basic authentication, these settings provide
# the username and password that the Kibana server uses to perform maintenance on the Kibana
# index at startup. Your Kibana users still need to authenticate with Elasticsearch, which
# is proxied through the Kibana server.
#elasticsearch.username: "user"
#elasticsearch.password: "pass"
elasticsearch.username: "admin"
elasticsearch.password: "admin"
# Optional settings that provide the paths to the PEM-format SSL certificate and key files.
# These files validate that your Elasticsearch backend uses the same key files.
#elasticsearch.ssl.certificate: /path/to/your/client.crt
#elasticsearch.ssl.key: /path/to/your/client.key
elasticsearch.ssl.certificate: /home/develop/kibana-5.6.3-linux-x86_64/config/crts/eshttp.crt
elasticsearch.ssl.key: /home/develop/kibana-5.6.3-linux-x86_64/config/crts/eshttp.key
# To disregard the validity of SSL certificates, change this setting's value to 'none'.
#elasticsearch.ssl.verificationMode: full
elasticsearch.ssl.verificationMode: none各项配置看文件内说明,写的很清楚,这里就不翻译了,其中最重要的是这两样elasticsearch.ssl.certificate和elasticsearch.ssl.key,一定要与服务端保持一致。由于证书是自己生成的,校验项elasticsearch.ssl.verificationMode的值需要改为none。
启动kibana后,通过http://localhose:5601访问即可 收起阅读 »
社区日报 第97期 (2017-11-11)
-
sense为什么不能用了,看看ES官方怎么说? http://t.cn/RlB3B62
-
使用allocation API快速定位分片分配问题 http://t.cn/RlrzTsD
-
ES6.0有关防止硬盘被填满的改进 http://t.cn/RlrU3Nr
-
喜大普奔,ES社区支持Markdown编辑器了 https://elasticsearch.cn/article/366
-
Elastic 收购网站搜索 SaaS 服务领导者 Swiftype http://t.cn/Rl3a4P2
- 只等你来 | Elastic Meetup 广州交流会 https://elasticsearch.cn/article/364
-
sense为什么不能用了,看看ES官方怎么说? http://t.cn/RlB3B62
-
使用allocation API快速定位分片分配问题 http://t.cn/RlrzTsD
-
ES6.0有关防止硬盘被填满的改进 http://t.cn/RlrU3Nr
-
喜大普奔,ES社区支持Markdown编辑器了 https://elasticsearch.cn/article/366
-
Elastic 收购网站搜索 SaaS 服务领导者 Swiftype http://t.cn/Rl3a4P2
- 只等你来 | Elastic Meetup 广州交流会 https://elasticsearch.cn/article/364
Elastic 招聘技术支持工程师,坐标北京
Support Engineer - Mandarin Speaking
Location: Beijing, China
Department: Support
Responsibilities
- Ensuring customer issues are resolved within our committed service level agreements.
- Maintain strong relationships with our customers for the delivery of support.
- Have a mindset of continuous improvement, in terms of efficiency of support processes and customer satisfaction.
Experience
- Demonstrable experience in of support in technology businesses
- Experience working across multi-cultural and geographically distributed teams
Key Skills
- Strong verbal and written communication skills in both Mandarin and English.
- Customer orientated focus.
- Team player, ability to work in a fast pace environment with a positive and adaptable approach.
- Knowledge of databases or search technologies a plus.
- Demonstrated strong technical understanding of software products.
Additional Information
- Competitive pay and benefits
- Stock options
- Catered lunches, snacks, and beverages in most offices
- An environment in which you can balance great work with a great life
- Passionate people building great products
- Employees with a wide variety of interests
- Distributed-first company with employees in over 30 countries, spread across 18 time zones, and speaking over 30 languages!
Elastic is an Equal Employment employer committed to the principles of equal employment opportunity and affirmative action for all applicants and employees. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status or any other basis protected by federal, state or local law, ordinance or regulation. Elastic also makes reasonable accommodations for disabled employees consistent with applicable law.
About Elastic
Elastic is the world's leading software provider for making structured and unstructured data usable in real time for search, logging, security, and analytics use cases. Founded in 2012 by the people behind the Elasticsearch, Kibana, Beats, and Logstash open source projects, Elastic's global community has more than 80,000 members across 45 countries, and since its initial release. Elastic's products have achieved more than 100 million cumulative downloads. Today thousands of organizations, including Cisco, eBay, Dell, Goldman Sachs, Groupon, HP, Microsoft, Netflix, The New York Times, Uber, Verizon, Yelp, and Wikipedia, use the Elastic Stack, X-Pack, and Elastic Cloud to power mission-critical systems that drive new revenue opportunities and massive cost savings. Elastic is backed by more than $104 million in funding from Benchmark Capital, Index Ventures, and NEA; has headquarters in Amsterdam, the Netherlands, and Mountain View, California; and has over 500 employees in more than 30 countries around the world.
Our Philosophy
We’re always on the search for amazing people, people who have deep passion for technology and are masters at their craft. We build highly sophisticated distributed systems and we don’t take our technology lightly. In Elasticsearch, you’ll have the opportunity to work in a vibrant young company next to some of the smartest and highly skilled technologists the industry has to offer. We’re looking for great team players, yet we also promote independence and ownership. We’re hackers… but of the good kind. The kind that innovates and creates cutting edge products that eventually translates to a lot of happy, smiling faces.
LifeAtElastic
Support Engineer - Mandarin Speaking
Location: Beijing, China
Department: Support
Responsibilities
- Ensuring customer issues are resolved within our committed service level agreements.
- Maintain strong relationships with our customers for the delivery of support.
- Have a mindset of continuous improvement, in terms of efficiency of support processes and customer satisfaction.
Experience
- Demonstrable experience in of support in technology businesses
- Experience working across multi-cultural and geographically distributed teams
Key Skills
- Strong verbal and written communication skills in both Mandarin and English.
- Customer orientated focus.
- Team player, ability to work in a fast pace environment with a positive and adaptable approach.
- Knowledge of databases or search technologies a plus.
- Demonstrated strong technical understanding of software products.
Additional Information
- Competitive pay and benefits
- Stock options
- Catered lunches, snacks, and beverages in most offices
- An environment in which you can balance great work with a great life
- Passionate people building great products
- Employees with a wide variety of interests
- Distributed-first company with employees in over 30 countries, spread across 18 time zones, and speaking over 30 languages!
Elastic is an Equal Employment employer committed to the principles of equal employment opportunity and affirmative action for all applicants and employees. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status or any other basis protected by federal, state or local law, ordinance or regulation. Elastic also makes reasonable accommodations for disabled employees consistent with applicable law.
About Elastic
Elastic is the world's leading software provider for making structured and unstructured data usable in real time for search, logging, security, and analytics use cases. Founded in 2012 by the people behind the Elasticsearch, Kibana, Beats, and Logstash open source projects, Elastic's global community has more than 80,000 members across 45 countries, and since its initial release. Elastic's products have achieved more than 100 million cumulative downloads. Today thousands of organizations, including Cisco, eBay, Dell, Goldman Sachs, Groupon, HP, Microsoft, Netflix, The New York Times, Uber, Verizon, Yelp, and Wikipedia, use the Elastic Stack, X-Pack, and Elastic Cloud to power mission-critical systems that drive new revenue opportunities and massive cost savings. Elastic is backed by more than $104 million in funding from Benchmark Capital, Index Ventures, and NEA; has headquarters in Amsterdam, the Netherlands, and Mountain View, California; and has over 500 employees in more than 30 countries around the world.
Our Philosophy
We’re always on the search for amazing people, people who have deep passion for technology and are masters at their craft. We build highly sophisticated distributed systems and we don’t take our technology lightly. In Elasticsearch, you’ll have the opportunity to work in a vibrant young company next to some of the smartest and highly skilled technologists the industry has to offer. We’re looking for great team players, yet we also promote independence and ownership. We’re hackers… but of the good kind. The kind that innovates and creates cutting edge products that eventually translates to a lot of happy, smiling faces.
LifeAtElastic
收起阅读 »社区支持 Markdown 编辑器
为了改善大家的创作体验,提高大家的写作和分享热情!?,经过两天的不懈奋斗,终于把 Markdown 编辑器搬上来了。 目前只支持文章的发布,可以通过切换编辑器来选择 Markdown 编辑模式。 希望不要再以编辑器作为理由发只有链接的文章了。 ???????????
- 支持 Github 风格的 Markdown 格式
- 支持本站附件功能
- 支持 emoj 符号
- 支持自动的页面导航
- 以前的文章可再次编辑,切换 Markdown 模式然后修改保存
如何使用?
- 点击【发起】,选择文章
- 切换绿色按钮,将编辑器切换到 Markdown,然后在文本框内输入 Markdown 格式的内容即可。
在线 Markdown 脚本编辑预览工具:https://elasticsearch.cn/static/js/editor/markdown/
以下为样式测试参考,忽略其意义。
----------- 常用格式-----------------
# 标题1
## 标题2
### 标题3
#### 标题4
##### 标题5
###### 标题6
超大标题 //等于号写于文字下方
===
标题 //同超大标题
---
`短代码`
_ 注:长代码块,用三个: ` _
> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
* Red
* Green
* Blue
- Red
- Green
- Blue
+ Red
+ Green
+ Blue
1. 这是第一个
1. 这是第二个
1. 这是第三个
* * *
***
*****
- - -
---
[markdown-syntax](http://daringfireball.net/projects/markdown/syntax)
[id]: http://example.com/ "Optional Title Here"
This is [an example][id] reference-style link.
*内容*
**内容**
_内容_
__内容__
![这是张外链图片](https://static-www.elastic.co/assets/bltbfcd44f1256d8c88/blog-swifttype-thumb.jpg?q=845)
<http://elastic.co/>
<info@elastic.o>
四个空格
一个tab
----------- 样式预览-----------------
标题1
标题2
标题3
标题4
标题5
标题6
超大标题 //等于号写于文字下方
标题 //同超大标题
短代码
This is the first level of quoting.
This is nested blockquote.
Back to the first level.
- Red
- Green
-
Blue
- Red
- Green
-
Blue
- Red
- Green
- Blue
- 这是第一个
- 这是第二个
- 这是第三个
This is an example reference-style link.
内容 内容 内容 内容
四个空格
一个tab
https://github.com/infinitbyte/gopa 的 README 内容
GOPA, A Spider Written in Go.
Goal
- Light weight, low footprint, memory requirement should < 100MB
- Easy to deploy, no runtime or dependency required
- Easy to use, no programming or scripts ability needed, out of box features
Screenshoot
How to use
Setup
First of all, get it, two opinions: download the pre-built package or compile it yourself.
Download Pre Built Package
Go to Release or Snapshot page, download the right package for your platform.
Note: Darwin is for Mac
Compile The Package Manually
- Mac/Linux: Run
make build
to build the Gopa. - Windows: Checkout this wiki page - How to build GOPA on windows.
So far, we have:
gopa
, the main program, a single binary.
config/
, elasticsearch related scripts etc.
gopa.yml
, main configuration for gopa.
Optional Config
By default, Gopa works well except indexing, if you want to use elasticsearch as indexing, follow these steps:
- Create a index in elasticsearch with script
config/gopa-index-mapping.sh
Example
curl -XPUT "http://localhost:9200/gopa-index" -H 'Content-Type: application/json' -d' { "mappings": { "doc": { "properties": { "host": { "type": "keyword", "ignore_above": 256 }, "snapshot": { "properties": { "bold": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 }, "content_type": { "type": "keyword", "ignore_above": 256 }, "file": { "type": "keyword", "ignore_above": 256 }, "h1": { "type": "text" }, "h2": { "type": "text" }, "h3": { "type": "text" }, "h4": { "type": "text" }, "hash": { "type": "keyword", "ignore_above": 256 }, "id": { "type": "keyword", "ignore_above": 256 }, "images": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "italic": { "type": "text" }, "links": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "path": { "type": "keyword", "ignore_above": 256 }, "sim_hash": { "type": "keyword", "ignore_above": 256 }, "lang": { "type": "keyword", "ignore_above": 256 }, "size": { "type": "long" }, "text": { "type": "text" }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "version": { "type": "long" } } }, "task": { "properties": { "breadth": { "type": "long" }, "created": { "type": "date" }, "depth": { "type": "long" }, "id": { "type": "keyword", "ignore_above": 256 }, "original_url": { "type": "keyword", "ignore_above": 256 }, "reference_url": { "type": "keyword", "ignore_above": 256 }, "schema": { "type": "keyword", "ignore_above": 256 }, "status": { "type": "integer" }, "updated": { "type": "date" }, "url": { "type": "keyword", "ignore_above": 256 } } } } } } }'
Note: Elasticsearch version should > v5.0
- Enable index module in
gopa.yml
, update the elasticsearch's setting:- module: index enabled: true ui: enabled: true elasticsearch: endpoint: http://dev:9200 index_prefix: gopa- username: elastic password: changeme
Start
Gopa doesn't require any dependencies, simply run ./gopa
to start the program.
Gopa can be run as daemon(Note: Only available on Linux and Mac):
Example
➜ gopa git:(master) ✗ ./bin/gopa --daemon ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \ / \ ___ / | \| ___/ /_\ \ \ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/ [gopa] 0.10.0_SNAPSHOT ///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 /// [10-21 16:01:09] [INF] [instance.go:23] workspace: data/gopa/nodes/0 [gopa] started.
Also run ./gopa -h
to get the full list of command line options.
Example
➜ gopa git:(master) ✗ ./bin/gopa -h ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \ / \ ___ / | \| ___/ /_\ \ \ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/ [gopa] 0.10.0_SNAPSHOT ///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 /// Usage of ./bin/gopa: -config string the location of config file (default "gopa.yml") -cpuprofile string write cpu profile to this file -daemon run in background as daemon -debug run in debug mode, wi -log string the log level,options:trace,debug,info,warn,error (default "info") -log_path string the log path (default "log") -memprofile string write memory profile to this file -pidfile string pidfile path (only for daemon) -pprof string enable and setup pprof/expvar service, eg: localhost:6060 , the endpoint will be: http://localhost:6060/debug/pprof/ and http://localhost:6060/debug/vars
Stop
It's safety to press ctrl+c
stop the current running Gopa, Gopa will handle the rest,saving the checkpoint,
you may restore the job later,the world is still in your hand.
If you are running Gopa
as daemon, you may stop it like this:
kill -QUIT `pgrep gopa`
Configuration
UI
- Search Console
http://127.0.0.1:9001/
- Admin Console
http://127.0.0.1:9001/admin/
API
- TBD
Contributing
You are sincerely and warmly welcomed to play with this project, from UI style to core features, or just a piece of document, welcome! let's make it better.
License
Released under the Apache License, Version 2.0 .
Also XSS Test
alert('XSS test');
为了改善大家的创作体验,提高大家的写作和分享热情!?,经过两天的不懈奋斗,终于把 Markdown 编辑器搬上来了。 目前只支持文章的发布,可以通过切换编辑器来选择 Markdown 编辑模式。 希望不要再以编辑器作为理由发只有链接的文章了。 ???????????
- 支持 Github 风格的 Markdown 格式
- 支持本站附件功能
- 支持 emoj 符号
- 支持自动的页面导航
- 以前的文章可再次编辑,切换 Markdown 模式然后修改保存
如何使用?
- 点击【发起】,选择文章
- 切换绿色按钮,将编辑器切换到 Markdown,然后在文本框内输入 Markdown 格式的内容即可。
在线 Markdown 脚本编辑预览工具:https://elasticsearch.cn/static/js/editor/markdown/
以下为样式测试参考,忽略其意义。
----------- 常用格式-----------------
# 标题1
## 标题2
### 标题3
#### 标题4
##### 标题5
###### 标题6
超大标题 //等于号写于文字下方
===
标题 //同超大标题
---
`短代码`
_ 注:长代码块,用三个: ` _
> This is the first level of quoting.
>
> > This is nested blockquote.
>
> Back to the first level.
* Red
* Green
* Blue
- Red
- Green
- Blue
+ Red
+ Green
+ Blue
1. 这是第一个
1. 这是第二个
1. 这是第三个
* * *
***
*****
- - -
---
[markdown-syntax](http://daringfireball.net/projects/markdown/syntax)
[id]: http://example.com/ "Optional Title Here"
This is [an example][id] reference-style link.
*内容*
**内容**
_内容_
__内容__
![这是张外链图片](https://static-www.elastic.co/assets/bltbfcd44f1256d8c88/blog-swifttype-thumb.jpg?q=845)
<http://elastic.co/>
<info@elastic.o>
四个空格
一个tab
----------- 样式预览-----------------
标题1
标题2
标题3
标题4
标题5
标题6
超大标题 //等于号写于文字下方
标题 //同超大标题
短代码
This is the first level of quoting.
This is nested blockquote.
Back to the first level.
- Red
- Green
-
Blue
- Red
- Green
-
Blue
- Red
- Green
- Blue
- 这是第一个
- 这是第二个
- 这是第三个
This is an example reference-style link.
内容 内容 内容 内容
四个空格
一个tab
https://github.com/infinitbyte/gopa 的 README 内容
GOPA, A Spider Written in Go.
Goal
- Light weight, low footprint, memory requirement should < 100MB
- Easy to deploy, no runtime or dependency required
- Easy to use, no programming or scripts ability needed, out of box features
Screenshoot
How to use
Setup
First of all, get it, two opinions: download the pre-built package or compile it yourself.
Download Pre Built Package
Go to Release or Snapshot page, download the right package for your platform.
Note: Darwin is for Mac
Compile The Package Manually
- Mac/Linux: Run
make build
to build the Gopa. - Windows: Checkout this wiki page - How to build GOPA on windows.
So far, we have:
gopa
, the main program, a single binary.
config/
, elasticsearch related scripts etc.
gopa.yml
, main configuration for gopa.
Optional Config
By default, Gopa works well except indexing, if you want to use elasticsearch as indexing, follow these steps:
- Create a index in elasticsearch with script
config/gopa-index-mapping.sh
Example
curl -XPUT "http://localhost:9200/gopa-index" -H 'Content-Type: application/json' -d' { "mappings": { "doc": { "properties": { "host": { "type": "keyword", "ignore_above": 256 }, "snapshot": { "properties": { "bold": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 }, "content_type": { "type": "keyword", "ignore_above": 256 }, "file": { "type": "keyword", "ignore_above": 256 }, "h1": { "type": "text" }, "h2": { "type": "text" }, "h3": { "type": "text" }, "h4": { "type": "text" }, "hash": { "type": "keyword", "ignore_above": 256 }, "id": { "type": "keyword", "ignore_above": 256 }, "images": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "italic": { "type": "text" }, "links": { "properties": { "external": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } }, "internal": { "properties": { "label": { "type": "text" }, "url": { "type": "keyword", "ignore_above": 256 } } } } }, "path": { "type": "keyword", "ignore_above": 256 }, "sim_hash": { "type": "keyword", "ignore_above": 256 }, "lang": { "type": "keyword", "ignore_above": 256 }, "size": { "type": "long" }, "text": { "type": "text" }, "title": { "type": "text", "fields": { "keyword": { "type": "keyword" } } }, "version": { "type": "long" } } }, "task": { "properties": { "breadth": { "type": "long" }, "created": { "type": "date" }, "depth": { "type": "long" }, "id": { "type": "keyword", "ignore_above": 256 }, "original_url": { "type": "keyword", "ignore_above": 256 }, "reference_url": { "type": "keyword", "ignore_above": 256 }, "schema": { "type": "keyword", "ignore_above": 256 }, "status": { "type": "integer" }, "updated": { "type": "date" }, "url": { "type": "keyword", "ignore_above": 256 } } } } } } }'
Note: Elasticsearch version should > v5.0
- Enable index module in
gopa.yml
, update the elasticsearch's setting:- module: index enabled: true ui: enabled: true elasticsearch: endpoint: http://dev:9200 index_prefix: gopa- username: elastic password: changeme
Start
Gopa doesn't require any dependencies, simply run ./gopa
to start the program.
Gopa can be run as daemon(Note: Only available on Linux and Mac):
Example
➜ gopa git:(master) ✗ ./bin/gopa --daemon ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \ / \ ___ / | \| ___/ /_\ \ \ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/ [gopa] 0.10.0_SNAPSHOT ///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 /// [10-21 16:01:09] [INF] [instance.go:23] workspace: data/gopa/nodes/0 [gopa] started.
Also run ./gopa -h
to get the full list of command line options.
Example
➜ gopa git:(master) ✗ ./bin/gopa -h ________ ________ __________ _____ / _____/ \_____ \\______ \/ _ \ / \ ___ / | \| ___/ /_\ \ \ \_\ \/ | \ | / | \ \______ /\_______ /____| \____|__ / \/ \/ \/ [gopa] 0.10.0_SNAPSHOT ///last commit: 99616a2, Fri Oct 20 14:04:54 2017 +0200, medcl, update version to 0.10.0 /// Usage of ./bin/gopa: -config string the location of config file (default "gopa.yml") -cpuprofile string write cpu profile to this file -daemon run in background as daemon -debug run in debug mode, wi -log string the log level,options:trace,debug,info,warn,error (default "info") -log_path string the log path (default "log") -memprofile string write memory profile to this file -pidfile string pidfile path (only for daemon) -pprof string enable and setup pprof/expvar service, eg: localhost:6060 , the endpoint will be: http://localhost:6060/debug/pprof/ and http://localhost:6060/debug/vars
Stop
It's safety to press ctrl+c
stop the current running Gopa, Gopa will handle the rest,saving the checkpoint,
you may restore the job later,the world is still in your hand.
If you are running Gopa
as daemon, you may stop it like this:
kill -QUIT `pgrep gopa`
Configuration
UI
- Search Console
http://127.0.0.1:9001/
- Admin Console
http://127.0.0.1:9001/admin/
API
- TBD
Contributing
You are sincerely and warmly welcomed to play with this project, from UI style to core features, or just a piece of document, welcome! let's make it better.
License
Released under the Apache License, Version 2.0 .
Also XSS Test
alert('XSS test');
收起阅读 »