nested类型中包含一个dense_vector的字段,通过cosineSimilarity计算相似度,提示类转换异常
Elasticsearch | 作者 cxycxy | 发布于2023年03月16日 | 阅读数:3952
软件版本;8.4.3
运行环境;centos7
数据结构为 一条视频信息,包含多个人物信息,每个人物信息包含一个人脸特征向量
期望结果为:输入一个特征数组,和nested数组中的多个人脸特征循环对比,进行余弦相速度计算,相似度最高的为文档得分。
以下文档结构,
我的实现方式
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": {
"params": {
"queryVector": []
},
"source": "def maxFeatureSource=0.0;for (def personItem : params['_source']['videoPersonList']) {double tmp = cosineSimilarity(params.queryVector,personItem['featureVector']);if(maxFeatureSource<tmp){maxFeatureSource=tmp;}}return maxFeatureSource;"
}
}
}
}
}
报错
这种方式会报错,正确的实现方式应该是怎样的?
运行环境;centos7
数据结构为 一条视频信息,包含多个人物信息,每个人物信息包含一个人脸特征向量
期望结果为:输入一个特征数组,和nested数组中的多个人脸特征循环对比,进行余弦相速度计算,相似度最高的为文档得分。
以下文档结构,
{
"mappings": {
"properties": {
"videoId": {
"type": "keyword"
},
"videoPersonList": {
"type": "nested",
"properties": {
"personName": {
"type": "keyword"
},
"picId": {
"type": "keyword"
},
"featureVector": {
"type": "dense_vector",
"dims": 512
}
}
}
}
}
}
我的实现方式
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": {
"params": {
"queryVector": []
},
"source": "def maxFeatureSource=0.0;for (def personItem : params['_source']['videoPersonList']) {double tmp = cosineSimilarity(params.queryVector,personItem['featureVector']);if(maxFeatureSource<tmp){maxFeatureSource=tmp;}}return maxFeatureSource;"
}
}
}
}
}
报错
{
"error": {
"root_cause": [
{
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"java.base/java.lang.Class.cast(Class.java:3921)",
"tmp = cosineSimilarity(params.queryVector,personItem['featureVector']);",
" ^---- HERE"
],
"script": "def maxFeatureSource=0.0;for (def personItem : params['_source']['videoPersonList']) {double tmp = cosineSimilarity(params.queryVector,personItem['featureVector']);if(maxFeatureSource<tmp){maxFeatureSource=tmp;}}return maxFeatureSource;",
"lang": "painless",
"position": {
"offset": 145,
"start": 93,
"end": 164
}
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "video",
"node": "i1a9H3dRRhuAMPfnr7dgzA",
"reason": {
"type": "script_exception",
"reason": "runtime error",
"script_stack": [
"java.base/java.lang.Class.cast(Class.java:3921)",
"tmp = cosineSimilarity(params.queryVector,personItem['featureVector']);",
" ^---- HERE"
],
"script": "def maxFeatureSource=0.0;for (def personItem : params['_source']['videoPersonList']) {double tmp = cosineSimilarity(params.queryVector,personItem['featureVector']);if(maxFeatureSource<tmp){maxFeatureSource=tmp;}}return maxFeatureSource;",
"lang": "painless",
"position": {
"offset": 145,
"start": 93,
"end": 164
},
"caused_by": {
"type": "class_cast_exception",
"reason": "Cannot cast java.util.ArrayList to java.lang.String"
}
}
}
]
},
"status": 400
}
这种方式会报错,正确的实现方式应该是怎样的?
2 个回复
cxycxy
赞同来自:
通过script中重写一个余弦相似度的方法,不使用es提供的默认方法
{
"query": {
"function_score": {
"query": {
"match_all": {}
},
"script_score": {
"script": {
"params": {
"queryVector": []
},
"source": "def maxFeatureSource = 0.0;for (def personItem : params['_source']['videoPersonList']) {double dotProduct = 0.0;double normA = 0.0;double normB = 0.0;for (int i = 0; i < params.queryVector.size(); i++) {dotProduct += params.queryVector.get(i) * personItem['featureVector'].get(i);normA += Math.pow(params.queryVector.get(i), 2);normB += Math.pow(personItem['featureVector'].get(i), 2);}double tmp = dotProduct / (Math.sqrt(normA) * Math.sqrt(normB));if (maxFeatureSource < tmp) {maxFeatureSource = tmp;}}return maxFeatureSource;"
}
}
}
}
}
mryu
赞同来自:
{
"query": {
"nested": {
"path": "content_embeddings",
"score_mode": "max",
"query": {
"script_score": {
"query": {
"match_all": {}
},
"script": {
"source": "cosineSimilarity(params.query_vector, 'content_embeddings.vector') + 1.0",
"params": {
"query_vector": [
1,
1,
1
]
}
}
}
}
}
},
"size": 10,
"sort": [
{
"_score": {
"order": "desc"
}
}
]
}