三人行必有我师

logstash同步mysql的数据到elasticsearch时重复

Logstash | 作者 wangdali | 发布于2018年02月06日 | 阅读数:6690

在用logstash将mysql数据同步到es的时候出现了数据重复的问题不知道该怎么去重了,其中attachments/tag/types这三个属性都是数组
大家帮忙看一下:

屏幕快照_2018-02-06_13.54_.31_.png

其中logstash启动文件如下:
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash-5.6.2/config/mysql-connector-java-5.1.45.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
jdbc_connection_string => "jdbc:mysql://localhost:3306/test?useUnicode=true&characterEncoding=utf8&useSSL=false&autoReconnect=true&createDatabaseIfNotExist=true"
jdbc_user => "root"
jdbc_password => "root"
jdbc_default_timezone => "Asia/Shanghai"
jdbc_paging_enabled => true
jdbc_page_size => 100000
jdbc_fetch_size => 10000
connection_retry_attempts => 3
connection_retry_attempts_wait_time => 1
jdbc_pool_timeout => 5
lowercase_column_names => true
record_last_run => true
schedule => "* * * * *"
use_column_value => true
tracking_column => "id"
statement_filepath => "/usr/share/logstash-5.6.2/config/knowledge_all.sql"
}
}
filter {
aggregate {
task_id => "%{id}"
code => "
map['id'] = event.get('id')
map['title'] = event.get('title')
map['attachments'] ||=
map['attachments'] << {
'id' => event.get('attachment_id'),
'filename' => event.get('attachment_filename'),
'path' => event.get('attachment_path')
}
map['types'] ||=
map['types'] << {
'value' => event.get('type_value'),
'label' => event.get('type_label')
}
map['tag'] ||=
map['tag'] << {
'id' => event.get('tag_id'),
'title' => event.get('tag_title')
}
event.cancel()
"
push_previous_map_as_event => true
}

}

output {
elasticsearch {
hosts => ["localhost:9200"]
index => "test"
document_type => "knowledge"
document_id => "%{id}"
}
}


knowledge_all.sql:
 
SELECT
DISTINCT
k.id,
k.title,
a.id as attachment_id,
a.filename as attachment_filename,
a.path as attachment_path,
t.id AS type_value,
t.title AS type_label,
ta.id as tag_id,
ta.title as tag_title
FROM t_knowledge k
LEFT JOIN t_knowledge_attachment ka ON ka.knowledge_id = k.id
LEFT JOIN t_attachment a ON ka.attachment_id = a.id
LEFT JOIN t_knowledge_type_relate tr ON tr.knowledge_id = k.id
LEFT JOIN t_knowledge_type t ON t.id = tr.type_id
LEFT JOIN t_knowledge_tag_relate tar ON tar.knowledge_id = k.id
LEFT JOIN t_knowledge_tag ta ON ta.id = tar.tag_id

 
已邀请:

jhonbo

赞同来自:

sql语句使用group by k.id 试试,你的sql语句本身查出来应该就是多个

laoyang360 - 《一本书讲透Elasticsearch》作者,Elastic认证工程师 [死磕Elasitcsearch]知识星球地址:http://t.cn/RmwM3N9;微信公众号:铭毅天下; 博客:https://elastic.blog.csdn.net

赞同来自:

同意楼上,你多表关联在mysql查下试试,有可能有重复数据。

Fanfan

赞同来自:

logstash有啥异常日志没有

要回复问题请先登录注册