erlang 分布式数据库Mnesia 实现及应用-PingCAP

erlang 分布式数据库Mnesia 实现及应用

网友投稿 1289 2023-04-03

erlang 分布式数据库Mnesia 实现及应用

- linear hash

ETS/DETS/mnesia 都使用了linear hash算法

redis dict 的实现类似于linear hash，渐进式rehash，保证操作是O(1)。不过除了每次操作时执行一个bucket的rehash，而且每100ms内使用1ms 执行加快rehash进程。

虽然虽然rehash过程渐进式的，但在key space过大时，同时使用LRU过期，buckets 这个大数组的malloc 就能让refis卡上一阵子。

曾遇到的一个案例：现网redis使用主备自动切换模式，有段时间老无故自动切换。排查发现是key space 1000kw+，切换时大量evict，bluckets 需要malloc一个*2的，也就是10M* 24 * 2 = 480M内存，内存一直处于满地状态，靠着LRU替换，此时需要清理出这么大一块，导致redis 实例数秒停止响应导致切换。从这个案例和内存利用率来看，redis 使用时尽量保证keyspace 别太大吧。

- ETS

Erlang内置数据库挑战7000WQPS

ETS 实现很简单，就一个内存字典。使用读写锁，只读情况下达到很高的TPS，曾在我老T420笔记本测试过字典在单核心情况下读写400w/s。从这个测试数据看ETS 的读操作其实和全局内存字典读取速度差不多，效率很高。写性能因为全局锁的关系，不可避免受限且并发越高性能越差。建议对写入频繁ETS做分表操作。

- DETS

ETS的落地存储方式，有单表2G大小限制，可以有cache 但默认cache 0 也就是默认读写都操作磁盘。

前面说到DETS 是基于linear hash 存储，hash 方式不是很磁盘友好、不是文件块 cache友好；cache 只是作为行级索引，没有块级索引。

总的说DETS 和真正完整的存储引擎还有一定差距，单独使用价值不大，所以基本都是用于基于它的Mnesia集群版本来使。

Since all operations performed by Dets are disk operations, it is important to realize that a single look-up operation involves a series of disk seek and read operations. For this reason, the Dets functions are much slower than the corresponding Ets functions, although Dets exports a similar interface.Dets organizes data as a linear hash list and the hash list grows gracefully as more data is inserted into the table. Space management on the file is performed by what is called a buddy system. The current implementation keeps the entire buddy system in RAM, which implies that if the table gets heavily fragmented, quite some memory can be used up. The only way to defragment a table is to close it and then open it again with the repair option set to force.

- Mnesia

基于ETS/DETS, 的纯erlang 实现的强大分布式数据库，而disc Mnesia 表大小受dets 限制，但可以使用fragmentation，frag 类似于分区表。

使用LevelDB 替换DETS（1/4启动时间，1/2冲突，1/3 内存占用）

Mnesia Backend Plugin Framework and a LevelDB-based Plugin: Roland Karlsson, Malcolm Matalka

whatsapp：

disc_copies tables

Partitioned islands and fragmented tables

All operations run async_dirty

Use key hashing to collapse all ops per key

to a single process

First of all, mnesia has no 2 gigabyte limit. It is limited on a 32bit architecture, but hardly any are present anymore for real work. And on 64bit, you are not limited to 2 gigabyte. I have seen databases on the order of several hundred gigabytes. The only problem is the initial start-up time for those.Mnesia is built to handle: Very low latency K/V lookup, not necessarily linearizible.Proper transactions with linearizible changes (C in the CAP theorem). These are allowed to run at a much worse latency as they are expected to be relatively rare.On-line schema changeSurvival even if nodes fail in a cluster (where cluster is smallish, say 10-50 machines at most)The design is such that you avoid a separate process since data is in the Erlang system already. You have QLC for datalog-like queries. And you have the ability to store any Erlang term.Mnesia fares well if the above is what you need. Its limits are:You can't get a machine with more than 2 terabytes of memory. And loading 2 teras from scratch is going to be slow.Since it is a CP system and not an AP system, the loss of nodes requires manual intervention. You may not need transactions as well. You might also want to be able to seamlessly add more nodes to the system and so on. For this, Riak is a better choice.It uses optimistic locking which gives trouble if many processes tries to access the same row in a transaction.

麒麟v10 上部署 TiDB v5.1.2 生产环境优化实践

1289 2023-04-03

erlang 分布式数据库Mnesia 实现及应用

麒麟v10 上部署 TiDB v5.1.2 生产环境优化实践

高成本云服务？TiDB 帮你省钱

零售业数据库选型与迁移ToC系统实践大规模场景应用

推荐文章

HTAP 还可以这么玩？丨TiDB 在 IoT 智慧园区的应用

新特性解析丨TiDB 资源管控的设计思路与场景解析

TiDB赋能保险业-首个全栈自主核心保单系统成功投产

首个云原生、分布式、全栈国产化银行核心业务系统投产上线丨TiDB × 杭州银行

TiDB 在社交场景的解决方案实践

电商数据技术栈，在海量数据增长下如何实现实时与全量兼得？

金融行业数据库的选择

TiDB 在智能制造中的应用实践

TiDB 在全球头部物流企业计费管理系统的应用实践

PingCAP与教育部教育管理信息中心合作，推动普惠教育数字化转型

友情链接

热评文章

TiDB 中标杭州银行核心系统数据库项目

TiDB 首批通过信通院 HTAP 数据库基础能力评

PingCAP 与 Wisconsin-Madiso

PingCAP 成为中国唯一入选 Forrester

TiDB 走进东软集团，共建医疗数字化基石

共享开源技术，共建开放生态丨平凯星辰余梦杰出席 20

erlang 分布式数据库Mnesia 实现及应用

微信扫一扫：分享

推荐文章

友情链接

热评文章