Hacker News — vinext + Cloudflare Workers

new
past
show
ask
show
jobs
submit

▲Just Use Postgres for Durable Workflows (dbos.dev)

140 points by KraftyOne 2 hours ago | 51 comments

buremba 54 minutes ago [-]

All you need is Postgres until you scale into TBs of data. We use Postgresql as a durable workflow engine, vector search, time-series data, BM25 search, OLTP/OLAP engine, and a queue. It's basically the only dependency we have for https://lobu.ai

The main benefit is centralizing all the data in one place so we don't need to worry about copying data in between multiple systems. Once something becomes the bottleneck, you can eventually migrate to a purpose specific tool to scale out.To be honest, LISTEN/NOTIFY in my opinion is the most fragile part of PG but it's fine as start until you scale out.

tibbon 5 minutes ago [-]

But when you hit that wall, it is hard to stop and convince people to use different patterns and systems. I've seen so many tables go from "it will only be a few thousand rows" to suddenly several TB and then people are looking confused when performance and db admin tasks get really difficult.

I'm working at a scale where almost every day I have to ask people "are you use you need to treat that as relational data? It doesn't seem relational"

throwaway7783 43 minutes ago [-]

I'm in the same camp. Do you use any specific extensions? Especially for OLAP and time series (partitioned tables + related extensions work fine, but curious if you use anything else)

buremba 28 minutes ago [-]

The native extensions are fine but I don't have good experience with any third party extensions, so far tried Timescale, pg_lake, citus, and pgvectorscale. They look very appealing but it's usually a trap as you can't get the value without using the vendor's cloud offerings.

I think if you grow enough to look for these extensions, it's usually better to bet on purpose-specific tooling. For example, I use DuckDB/Iceberg combination extensively for columnar data and connect DuckDB to PG when I need it.

hmaxdml 47 minutes ago [-]

Listen/notify is poised to become much better in PG 18 and 19

stuartaxelowen 38 minutes ago [-]

Why’s that?

TkTech 17 minutes ago [-]

In pg19 https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit... will land, which significantly improves NOTIFY performance. Right now LISTEN/NOTIFY doesn't scale to very busy instances because a `NOTIFY` within a transaction takes a global lock.

ivanr 14 minutes ago [-]

More context: https://www.recall.ai/blog/postgres-listen-notify-does-not-s...

pphysch 44 minutes ago [-]

I don't see logs mentioned. I agree with most those applications but would keep my OLAP stuff (metrics, logs, traces) in a separate store like VictoriaMetrics, both for capacity and read activity.

TkTech 3 minutes ago [-]

pg_timescale can take you pretty far for metrics and would be Good Enough for almost all users. Totally agree on raw, high-volume logs though.

buremba 33 minutes ago [-]

Yeah I have logs in Sentry, which also uses Postgresql.

llimllib 57 minutes ago [-]

Armin Ronacher's `absurd` is an implementation of durable workflows for postgres:

https://lucumr.pocoo.org/2025/11/3/absurd-workflows/

https://github.com/earendil-works/absurd

https://earendil-works.github.io/absurd/

I've not used it, but it's worth comparing to other options

vrm 6 minutes ago [-]

If you don't need a ton of throughput I think `absurd` (and our Rust derivative `durable`) are very nice options that keep the client side extremely simple. It's also lightweight enough that a coding agent can keep the entire thing in its head easily and just run queries to look up state as needed.

pragma_x 7 minutes ago [-]

I completely get the concept and agree - this is great way to build this kind of durability in a workflow system.

That said, my gamer-brain wants to call this "Save-scumming at scale." Which is to say, a lot of people already know that this approach works, but maybe they haven't made the connection to abstract CS stuff.

Another strategy that can be used to build robustness is to build your workflow out of idempotent operations. That can be useful for situations where the workflow state is too large to back up. Instead, you just run the job from the top and it's a bunch of no-ops until you start making progress again.

stuartaxelowen 33 minutes ago [-]

My dream is, instead of separating data storage, state machines, valid state constraints, and the logic that transitions between valid states, we can actually unify these into some kernel of app state. Honestly, Postgres already has a lot of these capabilities, but I don’t see an obvious story on the app or product level, providing provably correct sets of states that apps can transition between, and which they can automatically expose to clients in informative ways (this user can like this post, but not edit). It looks colored Petri net shaped to me, but I don’t yet see a simple app state paradigm in the same way that the database has obvious successful boundaries.

halamadrid 9 minutes ago [-]

We work on disk log based architecture for workflows at Unmeshed (https://unmeshed.io/) which helps it to scale at a fraction of the cost of traditional workflow systems that are based on expensive databases.

Postgres is not cheap to run in the cloud at scale. We went for the cheapest infra, which is basically the disk storage.

throwaw12 2 hours ago [-]

Curious to know experience of people using DBOS and Temporal.

I have used Temporal in the past, works really good, my only problem with it was some limits on request payload or event sizes, created some inconveniences to us when building solutions. It also enforces good engineering practices, but sometimes you don't want to write special logic if your CSV file is larger than 2Mb, upload it to S3, pass link, then download it in the workflow.

What is your experience with DBOS? How does it compare to Temporal in terms of operational complexity, feature parity and anything else

pants2 41 minutes ago [-]

I thought Temporal was overly complex, but as you said the best part is it does enforce good engineering practices.

Then I tried their Cloud offering and was appalled at their pricing. I burned through the $1,000 free credits before I even got something to production. Didn't want to bother with running a local Temporal, either.

Best solution is to just take inspiration from their architecture and then do it yourself in Postgres, IMO.

switchbak 2 hours ago [-]

They've just released an external storage approach to solve the large payload issue. I don't 100% love it (it's bolted on, not an intrinsic part), and it's an early release right now - but you can consider this effectively solved for now.

hilariously 1 hours ago [-]

That's good because back in the day if you were putting entire documents in a message queue I would laugh people out the door, putting something in object storage + linking is much more useful (though the distributed system part/backup current state part can be annoying!)

quard8 1 hours ago [-]

we're using dbos for ai gen workflows and processing video files. understanding how to migrate from celery took time, but for our case it was worth it.

temporal_thr123 1 hours ago [-]

I run a large on-prem temporal setup - throwaway acct as they will likely out me.

Temporal is, in my opinion having run it in prod for over a year - poorly designed, slow and ridicliously heavy infra wise.

If you're doing anything non-trivial (say, 200+ events/workflow) and you need to run only a couple hundred of them concurrently all day, you're going to spend millions on infra, and it's still going to absolutely suck.

Try running their own benchmarks, the numbers are pathetic.

Their sales team is also absolutely appalling and desperate.

From a Developer standpoint, the SDK is quite nice though.

Don't get trapped into nexus, and if the sales team call you make sure legal is in the room.

temporal_thr123 59 minutes ago [-]

Since I'm in a ranting mode -- here's a good example: you're limited to _ONE_ IO per shard in the history service:

https://github.com/temporalio/temporal/blob/e22e6304b3c4a409...

Temporal does a crazy amount of database operations and all of these are behind that mutex.

Oh, and you can't change the shard count on existing clusters.

Great stuff.

dakiol 42 minutes ago [-]

Agree. Have worked in a codebase using Temporal, and is pretty much a nightmare. I don't know about the infra side, but from the developer side, all the abstractions they bring to the table are poorly designed. Wouldn't recommend

opiniateddev 1 hours ago [-]

Conductor OSS does this quite well https://docs.conductor-oss.org/devguide/ai/index.html

https://github.com/agentspan-ai/agentspan which is essentially an agentic SDK layer for Conductor can convert any of your langgraph, openAI, vercel, or ADK agent and makes it durable and adds orchestration with no code changes.

sgt 2 hours ago [-]

Continuously amazed by what you can do with few tools, as long as Postgres is a part of your toolkit.

I recently developed a distributed queue and it works really great - benchmarks great too, with no race conditions or conflicts. I used SKIP LOCKED so that workers can compete safely.

You can also have multiple workers across nodes avoid conflict by using session wide mutexes i.e. pg advisory lock.

bootsmann 27 minutes ago [-]

Advisory locks are preferred for this anyways because holding a lot of SELECT FOR UPDATE doesn’t scale too well.

Edit: Actually I checked this again and apparently the advice has now changed to the inverse.

rafael-lua 9 minutes ago [-]

The "everything can be done in Postgres" crowd is crazy. It is like a religion at this point.

vrm 2 hours ago [-]

Since DBOS doesn't support Rust, we implemented a very minimal Rust version of this at https://github.com/tensorzero/durable. It has been quite stable and extensible but of course you need to be very careful with the SQL implementations. Hope this is interesting to readers here.

munk-a 1 hours ago [-]

We have a durable queue built into postgres to handle some complex notification-ish logic. It's worked excellently and while there are services various cloud providers would love to sell us to do that it's extremely cheap to run.

For that particular usage, the volume we process and business criticality make it a good choice for inventing here - but for other durable processes we just use off the shelf tools since the cost of maintenance would quickly outstrip the value.

Postgres is a great tool to use and far more powerful than most people give it credit for - but there's always the balance of in-house maintenance vs. paying rent for someone else's solution.

PunchyHamster 58 minutes ago [-]

what's "maintenance" here ? If app is also using PostgreSQL it should be just initial effort of writing/importing code to run it, no ?

munk-a 50 minutes ago [-]

You pay for everything you build - the more complexity you put into it the more that costs over time. Dependencies need to be updated, language/framework upgrades usually break something, new features/requirements introduce additional complexity and code to manage. Software just costs money every day - not a lot, our industry is much lower margin than, say, stamping sheets of metal into tools - but it still has operational costs beyond just the money to operate the hardware we run our products on.

PunchyHamster 47 minutes ago [-]

I know that. This looks like some lib you update once a year/every new CVE, and it is compared to a lib from cloud vendor and also update once a year/every new CVE, which is why I asked what it costed YOU in this particular case.

grahac 39 minutes ago [-]

Isn't this Just Oban from elixir? :)

switchbak 2 hours ago [-]

Having inherited a few of these - you tend to home-grow an ad-hoc version of many of the existing OSS tools, but with less of the patterns baked in.

Not sure where the NIH ends and where you're actually better off with a supported orchestration approach. I suppose if you expect your program to be around a while (or need advanced features), maybe think about using something a bit more battle tested?

pirsquare 1 hours ago [-]

I feel it's way too hand wavy on consistency and correctness. My opinion as someone who've implemented marketing workflows that breaks all the time (and tons of painful lessons).

Strong correctness guarantee is something that should not be undermine. Even more important than availability.

The examples on the website is simple but heavily undermines the importance of correctness. Anyone who implement similar pseudo-code directly will eventually suffer from data correctness issue in crashes.

  @DBOS.workflow()
  def checkout_workflow(items: Items):
      order = create_order()
      reserve_inventory(order, items)
      payment_status = process_payment(order, items)

      if payment_status == 'paid':
          fulfill_order(order)
      else:
          undo_reserve_inventory(order, items)
          cancel_order(order)

hmaxdml 58 minutes ago [-]

As you said, the example is simple and it might not be obvious to people without prod experience what the problems can be. Postgres can give you all the primitives you need to solve this at the application layer. Durable workflows on Postgres is an effective way to access these primitives.

magicseth 1 hours ago [-]

Convex has a workpool component that gives the ability to compose big complicated flows in an understandable way, and give you realtime updates on status of various pieces: https://www.convex.dev/components/workflow

senderista 2 hours ago [-]

Citing CockroachDB as an example of scaling Postgres made me spit out coffee. Was this LLM-written?

sorentwo 2 hours ago [-]

The efforts we've undergone to make Oban (and Pro) work with CRDB have been ridiculous. Feature detection all over because of a lack of common operators and functions that can't be used in indexes. The worst is the rampant "serialization_failure" errors that force continual transaction retries. Not how I'd suggest scaling Postgres.

That said, as a predecessor to dbos in building durable workflows just using Postgres, I concur with the overall sentiment.

bcooke 32 minutes ago [-]

Can you expand on why you chose to use CRDB with Oban? I have no opinion here, I’m genuinely curious as someone using Oban myself (with Postgres). I haven’t hit the point of really needing to scale it out yet and I’d rather avoid the traps others have figured out.

Reubend 31 minutes ago [-]

Yeah that seems off to me too. But I guess they meant that since CockroachDB is compatible with Pg, it would also serve the same prupose?

hbarka 1 hours ago [-]

How do you incorporate secrets in this kind of implementation? Stored in db?

KraftyOne 38 minutes ago [-]

Secrets are orthogonal to durable execution--what are your concerns about using them together?

mrits 13 minutes ago [-]

Unless you have a very specific use case, you wouldn't want to store in db or in any message you use in any workflow like this. Usually whatever does the actual work has a way to get the secret.

OutOfHere 49 minutes ago [-]

I am not convinced that using a special software for "durable workflows" is necessary. If one has a stateful message queue or job task queue, e.g. RabbitMQ or Celery, one can use it. Irrespective, many jobs can be made idempotent. The most that you ought to residually need is a column in an existing table of your own database which keeps track of what remains to be done.

Given the above, it would seem that durable workflow software is pushed forward by those who have a surplus of VC money to spend. As for the vendors, there is no shortage of people trying to sell you things that you don't need.

elliot07 59 minutes ago [-]

how is this compared to hatchet?

cpursley 1 hours ago [-]

PgFlow is pretty awesome for DAG workflows - it's built on pgmq (which does the heavy lifting, making it backend agnostic).

Typescript: https://www.pgflow.dev

Elixir: https://github.com/agoodway/pgflow/blob/main/docs/COMPARISON...

llmslave 1 hours ago [-]

Temporal is an insane piece of software, always surprised people dont know about it. You could replace almost youre whole AWS stack with temporal

temporal_thr123 1 hours ago [-]

Sure, if you wanna run a 48 node cassandra cluster...

cpursley 1 hours ago [-]

I find it strange that some think in terms of AWS architecture as the default. You could replace nearly the entire AWS stack with an Elixir (Erlang) monolith + Postgres.

joshka 1 hours ago [-]

[flagged]

Rendered at 20:33:34 GMT+0000 (Coordinated Universal Time) with Cloudflare Workers.