Latest Citrusleaf News
Aug 27, 2012
My new clients at Aerospike have a range of minor news to announce: A company and product name change (they used to be Citrusleaf). Some new people and funding. In association with an acqui-hire — of AlchemyDB guy Russ Sullivan — some unspecified future technical plans. A community edition (Aerospike, nee’ Citrusleaf, is closed-source). Mainly, however, they want to call your attention to the fact that they’ve been selling a fast, reliable key-value store, with a number of production references, and want to suggest that other organizations should perhaps buy it as well. Generally, the Aerospike product story is as I described in two posts last year. At the highest level: Aerospike has a key-value data model. Secondary indexes and so on are still futures. Aerospike is clustered, of course. Two hardware/storage choices are encouraged: Spinning disk, but you keep all your data in RAM. Solid-state disk. AeroSpike’s three core marketing claims are performance, consistent performance, and uninterrupted operations. Aerospike’s performance claims are supported by a variety of blazing internal benchmarks. Aerospike’s consistent performance claims are along the lines of sub-millisecond latency, with 99.9% of responses being within 5 milliseconds, and even a node outage only borking performance for some 10s of milliseconds. Uninterrupted operation is a core AeroSpike design goal, and the company says that to date, no AeroSpike production cluster has ever gone down. Aerospike technical details start with the expected: Shared-nothing. Log-structured. Many more logical data partitions than physical ones (default is 4000, but you can double that a few times if you want). Synchronous replication within a data center; asynchronous for disaster recovery or other geographical distribution. Further technical details include: Aerospike is divided into three layers: Client, distribution, and data. The client layer lives with the application; the distribution and data layers live with each other. The distribution layer does the main mapping, but every node of any kind has a full map of the partitions. Aerospike is written in C (hence no garbage collection). Aerospike finds data in two steps: Keys are hashed to partition IDs; each partition travels with an index that is used to find data within it. The index is red-black rather than b-tree. Those indexes carry expiry information and so on, so data is invalidated rather than being deleted in place. Actual deletion only occurs via a defrag/compaction operation. For business metrics and so on, the following is edited from an email sent over by Aerospike marketing VP Monica Pal. (The original, naturally, had a lot more marketing-speak. ) Headcount – 30 and hiring. # of production customers – mid double digits, all paying. Biggest database – 12TB and doubling. Most customers are at 1-4TB of unique data; most replicate at least x2; many also replicate across data centers. (After saying about three times it’s OK that Aerospike clusters are small because they can do so much work on each node. ) We have hundreds of servers in our own test lab to exercise the clustered architecture. Pricing – per terabyte and per datacenter, unlimited nodes per cluster, unlimited number of clusters, pay only for unique data, not replicas. Most start at $50k.