How FoundationDB works and why it works (2021)

How FoundationDB works and why it works (2021)




The Sequencer:

- does not have a persistent/disk-backed state

- It is a singleton process

- it and only it does order, no logs do ordering

... if the singleton sequencer crashes, I do not see on this high level description how the system recovers, if the sequencer is the only one that knows write order but has no persistent write "log".

What am I missing?

This... does not appear to be something you run outside of a dedicated datacenter, AWS with its awful networking and slow/silently throttling storage would probably muck this thing up under any substantive scale?


What you are missing is that the "tlogs" (transaction logs) actually hold the durable, fault tolerant write log. The sequencer is just a big fast in-memory data structure that checks if the many transactions coming into the system pass isolation checks (the I in ACID). That is, it accepts transaction so long as the keys that the transaction read haven't been modified in the mean time.

The reason it can fail without a correctness issue is that it can just reject all transactions in flight for the clients to retry. This is something the clients need to be prepared to do anyway because of optimistic concurrency.

It can run fine on AWS. Upon a failure, the sequencer role is very fast to re-elect onto another machine in the cluster because there is no persistent state at all.


It runs fine in AWS, Snowflake and many others run it there. The most recent FoundationDB paper goes into a lot more detail on their recovery protocol, it’s a lot more nuanced than you think, but it works extremely well


are there dumber alternatives? sort of like ndb cluster(its kv interface, to be precise) but fully disk based, so that transaction limits are practically unreachable?


An obvious question you face when deploying something like FDB is how to write your app on top of it. With FDB it's like RocksDB. You get a transactional key/value store, but that's a very low level interface for apps to work with.

FDB provides "layers", such as the Record layer. It helps map data to keys and values. But a more sophisticated solution that I sometimes wish would take off is this library:

It's a small(ish) open source project that implements an ORM-like API but significantly cleaned up, and it can run on any K/V backend. There's an FDB plugin for it, so you can connect your app directly to an FDB cluster using it. And with that you get built-in indexing, derived data, triggers, you can do queries using the Java collections API, there's a CLI, there's an API for making GUIs and everything else you might need for a business CRUD app. It's got a paper of its own and is quite impressive.

There are a few big gaps vs an RDBMS though:

1. There's no query planner. You write your own plans by using functional maps/filters/folds etc in regular Java (or Kotlin or Python or any other language that can run on the JVM).

2. It's weak on analytics, because there's no access control and the ad-hoc query language is less convenient than SQL.

3. There's no network protocol other than FDB itself, which assumes low latency networks. So if there's a big distance between the user generating the queries and the servers, you have a problem and will need to introduce an app specific protocol (or move the code).

@dang hasn't appeared on HN before*. If you'd be willing to post it and then email a heads-up, we'll put the submission in the second-chance pool (, explained at, so it will get a random placement on HN's front page.

* and the only previous related submission appears to be


"FDB can tolerate f failures with only f+1 replicas."

Wait a minute, I know that formula... Viewstamped Replication?? I need to read the foundation DB paper. (I mainly read CRDT stuff so hopefully it's understandable).


In general I'm really impressed foundation DB folks.

The talk "Testing Distributed Systems w/ Deterministic Simulation" by Will Wilson blew my mind. TL;DR they spent the majority of their initial dev effort into making a simulation of the database, then when they were happy with that plugged in real storage, time and networks at the end. Well worth a watch for anyone interested in distributed systems & reliability.


"FDB can tolerate f failures with only f+1 replicas." is too vague. What kind of failures and in which situations?

If "failure" is a netsplit, only single partition would allow writes, because they choose CP from CAP theorem.


Oh man I've just had a (friendly!) debate on this with some distsys folks on twitter.

General consensus (no pun intended!) is the term availability is not really well defined, and the CAP thoerem is not a useful way to think about things (see Martin Kleppmanns "the unhelpful CAP theorem" in DDIA).


Available = you can see one of replicas, you are good to go. CAP is good to understand what are the limitations when you have partitioned network.

FoundationDB does not give you Availabity though, only CP.


The problem is that the term available is overloaded. In CAP “(A)vailable” specifically means you can keep making db updates as long as you can talk to any db node (e.g. you and a db node have split off the internet together). In every other distributed systems context “available” means the system doesn’t stop working overall when failures happen. These are very different usages and it confuses a lot of people.


Re: Will’s talk, which I agree is awesome.

They recently turned that knowledge into a product, still early/rough but holy crap it feels like dark wizardry to use it.

Plus these folks are really top shelf humans to work with.


"f failures with f+1 replicas" is the standard for all non-byzantine fault tolerant systems out there. You will find it in Paxos, Raft, Viewstamped Replication, etc.

It makes sense if you think about it: these systems follow a leader/replica model, and naturally you only need one leader to make progress


Amending my own parent comment, since it won't let me edit: I was wrong about this being standard in Paxos/Raft etc. They actually require "f failures with 2f+1 replicas" (meaning that at a minimum a strict majority of replicas need to be available). I blame my morning brain.


Raft (at least) goes offline if more than half the replicas are gone, doesn't it? It won't accept writes, and it won't serve reads unless you've explicitly chosen to serve stale reads.


That's a beheviour of quorum systems - majority voting. It guarantees no inconsistent writes in the event of a network partition, where each half of the replicas are workig fine and can talk to each other, but are getting no response from the other half.

But if you can reliably confirm that all but one nodes have "failed", for a suitably robust definition of failed, that's a different scenario. This means even though you can't communicate with a failed node in the normal way, you are able to get confirmation that the node cannot respond to normal messages to any other nodes or clients, and something (maybe controlling the node, or software on the node itself) guarantees to prevent those responses, until the node goes through a recovery and reintegration process.

Some ways this is done are using remote-controlled power, remote-controlled reboot, or reconfiguring the network switches to cut off the node. Just to ensure it can't come back and carry on responding as if nothing happened except a temporary delay. There's some subtlety to doing this robustly: Consider a response packet that got onto the network before the cut off event, but is delayed a long time inside the network due to a queue or fault.

After reliable "failure" confirmation, you can shrink the quorum size dynamically in response, even down to a single node, and then resume forward progress.


You're right, I misspoke in my original comment. See

What usually happens is that a leader won't confirm an operation as successful until such operation has been applied in a quorum of replicas (see: synchronous replication).

In theory, nothing prevents a leader from accepting new writes even if it can't reach a quorum, provided it never allows reading operations that haven't been replicated to a number of replicas.


Replica is ambiguous here: is it 1 leader and n replicas? Or is it just n replicas, one of which is assigned "leader"?

I thought "these systems follow a leader/replica model" would be the former, but "f failures with f+1 replicas" the latter.


It is same for all CP systems in terms of CAP. During partition, clients that have access to the leader, could read/write. Clients that have access to non-leader servers could only read consistent data to the point when non-leader lost connection to the leader (i.e. old data, but still consistent).


It's a cluster size of n replicas, with one of the n being the (current) leader.

f failures with f+1 replicas is a cluster size of n replicas can sustain n-1 failures. n=f+1 or f=n-1. You wanna be able to sustain f failures, you need a cluster size (n) of f+1.

When there is a failure, a non-failing node becomes the leader (or there's no leader change if the current leader isn't the one that failed). A cluster size of 1 has 1 leader, and can sustain 0 failures.


Yep makes sense - "f failures with f+1 replicas" does indeed refer to the latter definition. Thanks!


Thanks for putting it in context! VSR is the only one I've read into by virtue of it having a really readable paper.


Great article.

Demystified a lot about FDB for me.

> ”Summary: FDB is probably the best k/v store for regional deployment out there.”

Why should someone use Memcache or Redis then?

Is it for the data types in Redis?


Things like Redis and Memcache are not serious data stores. Don't put any data in them that you really need back out later.


Redis has a few features outside of k/v, like a good pub-sub implementation, that make it very useful in addition to a good DX and mature libraries.

Memcache on the other habd is just solid and mature. It also has some inertia as being a solid k/v cache. For example: NextCloud supports afaik both Redis and Memcache as caching engines but doesn't have FDB support.


FDB is more a framework to create your own distributed database creating what they call a "layer".


I believe the author means "the best transactional k/v store"

[deleted by user]

Because memcache and redis are in-memory. Writing to fdb will be complete once it is fsync'ed to disk.

Memcache is a cache. Fdb is a an ordered kv store.


You can configure redis to flush to disk on write operations though you lose on performance.


That is not comparable to writes completing after they are flushed to disk.


>appendfsync always: fsync every time new commands are appended to the AOF. Very very slow, very safe. Note that the commands are appended to the AOF after a batch of commands from multiple clients or a pipeline are executed, so it means a single write and a single fsync (before sending the replies).

It's very slow, but if you really want to wait for fsync before replying, it can do that.


I was unaware they could make that guarantee.

Thanks for the correction.


Redis can be configured to persist and fsync every operation.


Redis is not meant as a primary database and should not be used as such. FoundationDB is meant as a reliable source of truth.


Redis began as a caching database, but it has since evolved into a primary database. Many applications built today use Redis as a primary database.

Very much seems like an acceptable use now


Redis can't have a working set larger than memory. It has no mechanism to page data to disk. If your data set grows too large, you're hosed unless you add more hardware.




How do people deploy FDB to the cloud? Is it possible to deploy it without EBS to take advantage of cheaper VM temporary storage?


I won’t do that for production. Regional failure is not impossible although it is rare. You will lose all of your data.


Yes I would think so. FDB is distributed by default and the cluster is very easy to setup. As long as you have sufficient number of VMs in a cluster, the loss of a single vm or disk will not matter as you can spin up a new VM to join the cluster. On AWS, you can set up members of the cluster in different availability zones in the same region .. so the outage of on zone will not impact your database.

I am running this set up in my dev (personal) environment on AWS.


In my company, we tested FDB for two years, then we wrote a new backend for Warp 10 timeseries database... Performances are really impressive, we dropped HBase backend when we released Warp 10 3.0. Note we can isolate customers easily on the same FDB cluster (tenants are not explained anywhere on internet, it is a quite recent FDB feature).

more info:


If theres just one Sequencer, and every ReadVersion request to the Proxy eventually hits the Sequencer 1-1, how does the Sequencer not get crushed? Or is a scaling limit just "the number of ReadVersion requests a Sequencer machine can handle per second", which admittedly is a cheap request to respond to


Requests to the sequencer are batched heavily. If the sequencer fails, the cluster goes through a recovery and will be unavailable for 2-3 seconds and then recover.


Good point about the batching! Any idea what kind of ReadVersion qps throughput you can get this way? And yeah, 2-3s unavailability seems fine.


Not used FDB but reading the article and considering the semantics, it shouldn't matter too much if the Sequencer "just" distributes recent read-versions to the proxy frequently enough(unless that proxy has received a recent read-version "recently").

Worst case if there is heavy contention on the same keys then resolvers will eventually fail more transaction writes but for read-only transactions most applications should be fine with a slightly "old" version.

(Yes, all this will start to cause down if there is high key contention and many conflicts)


My understanding of ReadVersion is that the only point of calling it is to be able to read your own writes - so staleness wouldn't be good. There was another sibling comment that says the ReadVersion requests are batched up before hitting the sequencer, I could definitely believe that would work.


Yeah, read versions are just there to make what's known as "external consistency" work. That is, if transaction B starts after transaction A commits, then B will see the effects of A.

The reason external consistency is nice is that if you change the database, as soon as you get a commit signal, you can tell any other client "hey, go check the database and see what I did" and they will see the changes. No worries about whether the changes have sync'ed yet or anything like that.


Yeah that seems like an untenable design choice. Was quite interested until I read that. Max TPS? and MTTR when sequence inevitably shits itself?


Replied above


You can trivially scale fdb to tens of millions of tx/sec for write-heavy workloads without a hardcore cluster for transactions of reasonable complexity (though with careful design on my part and the part of others for collisions to be unlikely).

MTTR on failure is seconds. Really, there's no system I've used that is as robust and performant as fdb and I include s3 in that list - s3, for example, _routinely_ has operations with orders of magnitude latency variance and huge, correlated spikes.


This is the second article after the "caddy" one that I am having troble finding a usecase.

Nginx eixsts, why do I need to learn caddy?

Redis exists, why do I need to learn FundationDB?


I don't know enough to compare databases, but:

> Nginx eixsts, why do I need to learn caddy?

If you've learned to manage nginx, by all means use it, but for new users it's more like "caddy works with like 3 lines of configuration, including HTTPS, why would I learn nginx?"


I don't want to derail too much but this is interesting, because I recently had the opposite experience.

I hadn't spun up a webserver other than Kestrel for a long time, and was absolutely looking for the easiest solution for putting a reverse proxy in front of an API. No huge traffic requirements or low latency, seemed a perfect fit for caddy.

Then I googled to make sure that the necessary featureset was there and saw that rate limiting is a plugin that's marked WIP. What's more, there seemed to be a couple to choose from.

So I went through the certbot steps (very quick + straightforward) and wrote the short nginx config based on one page of getting started docs and was up and running.


Since you mention Kestrel I'll assume .NET so I suggest you take a look at yarp. It's fully programmable and "plugins" are just small pieces of middleware, a lot of which is available as a nuget package.

(I'm the author of ProxyKit that predated yarp)


Oh, very nice. We do use YARP actually on some of those other projects, and I was going to mention it. It's great.

This particular (small, internal) project was in Go, so I wanted to use the opportunity to forego any extra runtimes and just used nginx.


> Redis exists, why do I need to learn FundationDB?

so you know that this question does not make sense.


From the article :

>non-sharded, strict serializable, fault tolerant, key-value store that supports point writes, reads and range reads.

k-v store, non-sharded, fault tolerant, reads and range reads. Redis has these.

>strict serializable

Redis's single threaded model does this (maybe, not entirely sure).

Please help me understand why is this comparison orthongonal or does it really replace redis in ways that I don't understand.


iCloud is build on FoundationDB. You couldn't build iCloud on Redis. Most people are not building iCloud though.


The short version: If you have precious data you want to be stored very reliably and consistently, choose FDB over Redis, (or choose a SQL database with a good reputation for reliable storage).

The two words that come to mind are "durable", which is the "D" in ACID, and the extensive testing of FDB's distributed robustness properties in a wide range of testing and fault scenarios.

If you want Redis to store durably, which means the data is reliably stored by the time the database replies to the client and won't magically disappear if there's a crash or power failure just after, you need to turn on its "fsync at every query" mode. This mode is very slow so durable storage is off by default in Redis. So by default Redit can lose the last 1 second of writes on power failure, kernel crash, or virtual machine abrupt termination.

In other words, FDB is desiged for very reliable storage of every transactional write, and has been built and tested with that in mind. Whereas Redis is not; they consider losing recent data in some realistic scenarios to be an acceptable default, and even with "fsync on every query" mode turned on it does not have the same level of testing and focus on durable distributed storage as FDB.

Without the "D", things built on top which fill out the rest of ACID transactions and database indexing aren't as reliable either. For example "C", consistency (with foreign keys, indexes, etc), is impossible to maintain if some of your recent writes may be lost while others are not. I don't know what guarantees Redis offers in this area, but it does not seem to be the focus. Whereas FDB authors make it clear this sort of thing is a core focus of the product which it is architected for and also heavily tested.


Well NGINX and Docker exist why do I need Caddy at least.


If you don't have a use case that requires FoundationDB you def. do not have to learn it.


Well, for that the comparative features needs to be clearly laid out.

It solves a problem that others have not solved. Which problem is that? And how better does it work than whatever there was?


It's an analysis of a paper, not a feature checklist, so the article is for people who want to understand more about how FoundationDB works rather than trying to provide a "which system should I use" explanation.

Though "super reliable distributed database presenting a single logical shard to client code" is a class of system for which I don't think there's anything else even close out there, and I suspect generally if that's something you -need- then you'll already know that.


It's a highly scalable (as in powers iCloud services with billion users) distributed transactional KV store. It's owned by Apple and mostly developed by Apple and Snowflake.


FoundationDB replaces MySQL/PostgreSQL (if the tradeoffs are acceptable) or Cassandra. It is a reliable distributed store.

Unless you are running Redis only with nothing else, fdb and redis do not play in the same space.


> FoundationDB replaces Cassandra

I think they have different use cases: fdb when you want transactions, cassandra when you want throughput.


> FoundationDB replaces MySQL/PostgreSQL

Only in terms of transactions across multiple data centers. In every other way vertically scaler sql performs better, especially for your dollar.


Performance is not the only metric. High availability (i.e. no single point of failure) with strong consistency are very important for some.


Is availability not an aspect of performance?

But of course, hence why i referenced multidc transactions.


> Is availability not an aspect of performance?

Not to my understanding. Can you elaborate?


Well, an unavailable service has trivially (perfectly) bad performance. I'm not sure how you'd conceive of performance to exclude availability, or what that conception would give us.


These technologies are primarily aimed at B2B markets and market them accordingly. You likely need none of them. Any meaningful technology is open source.


> Any meaningful technology is open source.

Clearly untrue, however FoundationDB is open source, with a permissive license.

So is much of the operational tooling for it:

[deleted by user]

Its more like if you are considering Cassandra, but would like transactions, take a look at FoundationDB first. Cassandra is coming with some tx support in v5 most likely(?).


Tangent about FoundationDB: this is a great video that explains how the team tested it

Spoiler: they even had their own custom power supplies used to test against power failures.


Fwiw this is standard procedure for anyone shipping a persistent storage product.


Only after FoundationDB made it standard.


FoundationDB's testing turns the rigorous up to 11 and I'm unaware of anybody else who's published a description of a testing approach that goes to quite the same extremes.

If that's just because I haven't noticed the others, I'd love to hear about them for comparison.


I found a dataloss bug in the first hour of testing FDB. I think their testing hype is a bit overhyped.

I also find it somewhat irritating that they won't take fixes or reports of problems with the storage engine because "we fixed this in redwood" when redwood is completely theoretical.


You've been able to use redwood since FDB 7.1 via ssd-redwood-1-experimental. Hardly theoretical.

Curious about that data loss bug. do you have a link? most bugs I've seen have to due with latency spikes and cluster unavailability. haven't seen any around data loss after transaction has committed.

[deleted by user]

Ah someone else posted that video. I'm still recovering from it, it turned my world upside down (in a good way!).


when do we choose FDB over Redis and vice versa?


If you need transactions and high availability, then use FDB. But if you also need low latency / high throughput, then you should consider RonDB

Disclaimer: i am involved with RonDB


I suspect:

- memcached if you don't need to persist the data

- Redis if you don't know whether you need to use Redis or FoundationDB

- FoundationDB if you learn that Redis doesn't do what you need

I don't mean this in any kind of a derogatory way but I suspect that if you need to ask then you probably don't need FDB.

The principle of keeping tech stacks boring and using well established components is less exciting as an Engineer but is usually the best choice.


FDB is for permanent data storage, Redis is for temporary data. It does not matter that you can configure Redis to persist data to disk because the performance most Redis use cases need makes it less usable that way.


We have run foundationdb in production for roughly 10 years. It is solid, mostly trouble free (with one very important exception: you must NEVER allow any node on the cluster to exceed 90% full), robust and insanely fast (10M+ tx/sec). It is convenient, has a nice programming model, and the client includes the ability to inject random failures.

That said, I think most coders just can't deal with it. For reasons I won't go into, I came to fdb already fully aware of the compromises that software transactional memories have, and fdb roughly matches the semantics of those: retry on failure, a maximum transaction size, a maximum transaction time, and so on. For those who haven't used it, start here: ; especially the section on transactions.

These constraints _very_ inconvenient for many kinds of applications so, ok, you'd like a wrapper library that handles them gracefully and hides the details (for example count of range).

This seems like it should be easy to do - after all, the expectation is that _application developers_ do it directly - but it isn't actually so in practice and introduces a layering violation into the data modeling if you have any part of your application doing direct key access. I recommend people try it. It can surely be done, but that layer is now as critical as the DB itself, and that has interesting risks.

At heart, the problem is, the limits are low enough that normal applications can and do run into them, and they are annoying. It would be really nice if the FDB team would build this next layer themselves with the same degree of testing but they themselves have not, and I think it's pretty clear that it turns out a small-transaction KV store is not enough to build complex layers in actuality.

Emphasis on the tested part - it's all well and good for fdb to be rock solid, but what needs to be there is that the actual interfact used by 90% of applications is rock solid, and if you exceed basic small-size keys or time, that isn't really true.


How is foundationdb compare to tidb and cockroachdb?


CockroachDB is many cool things but not even remotely in a the same category as fdb in terms of the transaction rates it can deal with per unit cost and you wouldn’t be mapping complex data structures into cdb; it’s not what it is for.

They just aren't the same thing. It’s like comparing a binary tree to json. If you squint you can see how they could be similar but really aren’t.


I think that’s a good and really fair summary.

- If you’re a developer wanting to build an application, you should really use a well designed layer between yourself and FDB. A few are out there.

- If you’re a dev thinking you want to build a database from scratch you probably should just use FDB as the storage engine and work on the other parts. To start, at very least!

(One last thing that I think is a bit overlooked with FDB is how easy it is to build basically any data structure you can in memory in FDB instead. Not that it solves the transaction timeout stuff, etc. but if you want to build skip list, or a quad tree, or a vector store, or whatever else, you can quite easily use FDB to build a durable, distributed, transactional, multi-user version of the same. You don’t have to stick to just boring tables.)


I think i would say that for me the biggest issue is that what little there is in well-written layers are all java. Nothing against java but I'd be looking for Go and Rust, not Java.

We did use fdb to backend more complex data structures (b+ and a kind of skiplist) and it's very cool. fdb basically presents the model of a software transactional memory and it's kind of wonderful, but it's not wonderful enough.

Another issue that I forgot to mention is that comprehensibility of keys is your own problem. Keys and values are just bytes - if you don't start day one with a shared envelope for all writers, you _will_ be in pain eventually. This can get kind of ugly if you somehow end up leaking keys that you can't identify.


> you should really use a well designed layer between yourself and FDB. A few are out there.

Any recommendations? All I could find is - not sure how complete and up-to-date that list is.


Not to take away from your main point, and I appreciate it, but I am interested in one minor you made which is - you wrote, "and insanely fast (10M+ tx/sec)"; When you say that, what does it mean, what's the context? Is it for the cluster, is it for one machine (what kind of cluster and networking, which machines, what machine), size of transactions, is there acknowledge after each, are they truly transactional or batch in one go..


Medium size multi-key transactions of substantial read-dependency complexity (many dependent keys loaded as part of the tx) and moderate key size; each tx of modest total size on the set side; This is an own-AWS cluster which means crap networking vs. a purpose-built on-prem network, instances with NVME storage.

fdb transactions are real transactions. These aren't batches.


We have seriously looked at FoundationDB to replace our SQL-based storage for distributed writes. We decided not to proceed unless we are about to overgrow the existing deploy, a standard leader-follower setup on the off-the-shelf hardware. The limiting factor for the latter would be a number of NMVMe drives we could put into a single machine. It gives us couple dozen Tb of structured data (we don't store blobs in the database) before we have to worry.

fdb is best when your workload is pretty well-defined and will stay such for a decade or so. It is not usually the case for new products which evolve fast. Two most famous installations of fdb are iTunes and Snowflake metadata. When you rewrite petabyte-size database in fdb, you transform continuous SRE/devops opex costs into developers capex investment. It comes with reduced risks for occasional data loss. For me it's mostly a financial decision, not really a technical one.


Were you planning on using the Record or Document layer if you went with it? Or maybe making your own layer?


We'd use the Record layer, but it was Java-only then. It would require us either to rewrite parts of our backend to Java or to implement some wrappers.


> transform continuous SRE/devops opex costs into developers capex investment

Would you mind expanding/educating me on this point? When I think of capex I think of “purchasing a thing that’s depreciated over a time window”. If you’d said “transform SRE/COGS costs into developer/R&D/opex costs” I would’ve understood, but eventually the thing leaves development and goes back into COGS.


Basically the SREs don't have anything to do with fdb for the most part. You add a node, quiesce a node, delete a node. Otherwise it's self-balancing and trouble-free from an SRE pov.

See my other message for the developer issues, though. IMHO fdb as it is today is too hard for most developers if their use case is anything beyond redis simple keys.



- developer time is approximately fungible with money - project delivery is building a thing, that you own, and that has value, and that you will use to produce other value... - ...which can therefore be entered on the balance sheet.

I've just left a company a little after it floated. In the run-up to the float, we were directed to maximise our capital time logged. That meant any kind of feature delivery. Bugfixes were opex.

I believe this was done to grow the balance sheet and maximise market cap.


I assume a couple of things here: 1) that SRE costs would be lower with fdb at scale due to its handling outages, i.e. auto-resharding; and 2) that a migration project from *sql to fdb will be finite (hence an investment I hastily called capex).

Would love to hear from anyone with experience in fdb whether these assumptions hold.

[deleted by user]


Is it correct to assume FDB is the perfect framework for creating a queue?


With FDB latency shoots up when a bunch of writers compete to update the same entry, because writers can have to retry many times before the write finally goes through, with a network round-trip each time. Personally I found this much harder to work with than a postgres queue with SKIP LOCKED for example.

There's this however: "QuiCK: A Queuing System in CloudKit": I suspect it really depends on what you expect from a queue, e.g. if you need strict FIFO or priorities, and how much effort you're willing to invest.


FDB uses SQLite to store data, but FDB doesn’t expose SQL to the end user.


Newer versions are moving towards a custom btree storage engine called Redwood


That’s true, but FDB doesn’t use very much of SQLite, just a modified version of SQLite’s internal b-tree.