存储路径
path.data 配置了多个路径后 es的存储和获取机制是什么
Elasticsearch • rojay 回复了问题 • 7 人关注 • 4 个回复 • 18332 次浏览 • 2018-08-29 09:14
elasticsearch存储index到HDFS
Elasticsearch • ybtsdst 回复了问题 • 5 人关注 • 4 个回复 • 10709 次浏览 • 2017-05-27 15:06
最近也遇到第一个问题。查找网上所有资料均未给出合适的答案,无奈只好硬着头皮去看源码。好在终于把这个原理理清楚了,来跟大家一起分享一下。
ES多盘shard分配原理
假设现在单机环境中有两块磁盘,es的配置文件elasticsearch.yml中的path.... 显示全部 »
ES多盘shard分配原理
假设现在单机环境中有两块磁盘,es的配置文件elasticsearch.yml中的path.... 显示全部 »
最近也遇到第一个问题。查找网上所有资料均未给出合适的答案,无奈只好硬着头皮去看源码。好在终于把这个原理理清楚了,来跟大家一起分享一下。
ES多盘shard分配原理
假设现在单机环境中有两块磁盘,es的配置文件elasticsearch.yml中的path.data:/index/data,/data2/index/data
配置了两块盘,对应了两个路径。那么我现在要创建hrecord1索引的2个主shard分配原理如下:
首先会创建shard1(我估计ES会优先创建shard编号大的shard,但是影响不大),创建shard1的时候会找出两个路径对应的磁盘空间大的那个盘,然后将shard1放到那个路径下。
创建shard0的时候,会将/index和/data2磁盘的剩余可用空间相加,然后将这个总和乘以百分之五
将前面创建shard1的磁盘空间减去这个百分之五的值,然后再将这个差值与/data2磁盘剩余空间进行比较,找出磁盘空间大的,然后把shard0放到那个大的磁盘空间上。
说白了,这个百分之五的空间是ES为那个创建的shard1设置的预留空间吧。
有错误的地方也欢迎大家指出,一起交流哈!
主要代码在ShardPath.java里面
[code]public static ShardPath selectNewPathForShard(NodeEnvironment env, ShardId shardId, IndexSettings indexSettings,
long avgShardSizeInBytes, Map<Path,Integer> dataPathToShardCount) throws IOException {
final Path dataPath;
final Path statePath;
if (indexSettings.hasCustomDataPath()) {
dataPath = env.resolveCustomLocation(indexSettings, shardId);
statePath = env.nodePaths()[0].resolve(shardId);
} else {
BigInteger totFreeSpace = BigInteger.ZERO;
for (NodeEnvironment.NodePath nodePath : env.nodePaths()) {
totFreeSpace = totFreeSpace.add(BigInteger.valueOf(nodePath.fileStore.getUsableSpace()));
}
// TODO: this is a hack!! We should instead keep track of incoming (relocated) shards since we know
// how large they will be once they're done copying, instead of a silly guess for such cases:
// Very rough heuristic of how much dtisk space we expec the shard will use over its lifetime, the max of current average
// shard size across the cluster and 5% of the total available free space on this node:
BigInteger estShardSizeInBytes = BigInteger.valueOf(avgShardSizeInBytes).max(totFreeSpace.divide(BigInteger.valueOf(20)));
// TODO - do we need something more extensible? Yet, this does the job for now...
final NodeEnvironment.NodePath[] paths = env.nodePaths();
NodeEnvironment.NodePath bestPath = null;
BigInteger maxUsableBytes = BigInteger.valueOf(Long.MIN_VALUE);
for (NodeEnvironment.NodePath nodePath : paths) {
FileStore fileStore = nodePath.fileStore;
BigInteger usableBytes = BigInteger.valueOf(fileStore.getUsableSpace());
assert usableBytes.compareTo(BigInteger.ZERO) >= 0;
// Deduct estimated reserved bytes from usable space:
Integer count = dataPathToShardCount.get(nodePath.path);
if (count != null) {
usableBytes = usableBytes.subtract(estShardSizeInBytes.multiply(BigInteger.valueOf(count)));
}
if (bestPath == null || usableBytes.compareTo(maxUsableBytes) > 0) {
maxUsableBytes = usableBytes;
bestPath = nodePath;
}
}
statePath = bestPath.resolve(shardId);
dataPath = statePath;
}
return new ShardPath(indexSettings.hasCustomDataPath(), dataPath, statePath, shardId);
}[/code]
ES多盘shard分配原理
假设现在单机环境中有两块磁盘,es的配置文件elasticsearch.yml中的path.data:/index/data,/data2/index/data
配置了两块盘,对应了两个路径。那么我现在要创建hrecord1索引的2个主shard分配原理如下:
首先会创建shard1(我估计ES会优先创建shard编号大的shard,但是影响不大),创建shard1的时候会找出两个路径对应的磁盘空间大的那个盘,然后将shard1放到那个路径下。
创建shard0的时候,会将/index和/data2磁盘的剩余可用空间相加,然后将这个总和乘以百分之五
将前面创建shard1的磁盘空间减去这个百分之五的值,然后再将这个差值与/data2磁盘剩余空间进行比较,找出磁盘空间大的,然后把shard0放到那个大的磁盘空间上。
说白了,这个百分之五的空间是ES为那个创建的shard1设置的预留空间吧。
有错误的地方也欢迎大家指出,一起交流哈!
主要代码在ShardPath.java里面
[code]public static ShardPath selectNewPathForShard(NodeEnvironment env, ShardId shardId, IndexSettings indexSettings,
long avgShardSizeInBytes, Map<Path,Integer> dataPathToShardCount) throws IOException {
final Path dataPath;
final Path statePath;
if (indexSettings.hasCustomDataPath()) {
dataPath = env.resolveCustomLocation(indexSettings, shardId);
statePath = env.nodePaths()[0].resolve(shardId);
} else {
BigInteger totFreeSpace = BigInteger.ZERO;
for (NodeEnvironment.NodePath nodePath : env.nodePaths()) {
totFreeSpace = totFreeSpace.add(BigInteger.valueOf(nodePath.fileStore.getUsableSpace()));
}
// TODO: this is a hack!! We should instead keep track of incoming (relocated) shards since we know
// how large they will be once they're done copying, instead of a silly guess for such cases:
// Very rough heuristic of how much dtisk space we expec the shard will use over its lifetime, the max of current average
// shard size across the cluster and 5% of the total available free space on this node:
BigInteger estShardSizeInBytes = BigInteger.valueOf(avgShardSizeInBytes).max(totFreeSpace.divide(BigInteger.valueOf(20)));
// TODO - do we need something more extensible? Yet, this does the job for now...
final NodeEnvironment.NodePath[] paths = env.nodePaths();
NodeEnvironment.NodePath bestPath = null;
BigInteger maxUsableBytes = BigInteger.valueOf(Long.MIN_VALUE);
for (NodeEnvironment.NodePath nodePath : paths) {
FileStore fileStore = nodePath.fileStore;
BigInteger usableBytes = BigInteger.valueOf(fileStore.getUsableSpace());
assert usableBytes.compareTo(BigInteger.ZERO) >= 0;
// Deduct estimated reserved bytes from usable space:
Integer count = dataPathToShardCount.get(nodePath.path);
if (count != null) {
usableBytes = usableBytes.subtract(estShardSizeInBytes.multiply(BigInteger.valueOf(count)));
}
if (bestPath == null || usableBytes.compareTo(maxUsableBytes) > 0) {
maxUsableBytes = usableBytes;
bestPath = nodePath;
}
}
statePath = bestPath.resolve(shardId);
dataPath = statePath;
}
return new ShardPath(indexSettings.hasCustomDataPath(), dataPath, statePath, shardId);
}[/code]
path.data 配置了多个路径后 es的存储和获取机制是什么
回复Elasticsearch • rojay 回复了问题 • 7 人关注 • 4 个回复 • 18332 次浏览 • 2018-08-29 09:14
elasticsearch存储index到HDFS
回复Elasticsearch • ybtsdst 回复了问题 • 5 人关注 • 4 个回复 • 10709 次浏览 • 2017-05-27 15:06