Show HN: PgDog – Scale Postgres without changing the app

(github.com)

144 points | by levkk 6 hours ago

12 comments

  • saisrirampur 3 hours ago
    Great progress, guys! It’s impressive to see all the enhancements - more types, more aggregate functions, cross-node DML, resharding, and reliability-focused connection pooling and more. Very cool! These were really hard problems and took multiple years to build at Citus. Kudos to the shipping velocity.
  • codegeek 3 hours ago
    Stupid question but does this shard the database as well or do we shard manually and then setup the configuration accordingly ?
    • levkk 3 hours ago
      It shards it as well. We handle schema sync, moving table data (in parallel), setting up logical replication, and application traffic cutover. The zero-downtime resharding is currently WIP, working on the PR as we speak: https://github.com/pgdogdev/pgdog/pull/784.
      • codegeek 3 hours ago
        Incredible. I am really interested in trying pgdog for our B2B SAAS product. Will do some testing.
  • cuu508 2 hours ago
    Some HTTP proxies can do retries -- if a connection to one backend fails, it is retried on a different backend. Can PgDog (or PgBouncer, or any other tool) do something similar -- if there's a "database server shutting down" error or a connection reset, retry it on another backend?
    • levkk 2 hours ago
      Not currently, but we can add this. One thing we have to be careful of is to not retry requests that are executing inside transactions, but otherwise this would be a great feature.
  • mijoharas 4 hours ago
    Happy pgdog user here, I can recommend it from a user perspective as a connection pooler to anyone checking this out (we're also running tests and positive about sharding, but haven't run it in prod yet, so I can't 100% vouch for it on that, but that's where we're headed.)

    @Lev, how is the 2pc coming along? I think it was pretty new when I last checked, and I haven't looked into it much since then. Is it feeling pretty solid now?

    • levkk 4 hours ago
      It feels better now, but we still need to add crash protection - in case PgDog itself crashes, we need to restore in-progress 2pc transaction records from a durable medium. We will add this very soon.
  • mosselman 1 hour ago
    I see the word 'replication' mentioned quite a few times. Is this managed by pgdog? Would I be able to replace other logical replication setups with pgdog to create a High Availability cluster?

    Do you have any write up on how to do this?

    • levkk 1 hour ago
      I'll need a bit more info about your use case to answer. We use logical replication to move data between shards, with the intention of creating new shards.

      This is managed by PgDog. We are building a lot of tooling here, and a lot of it is configurable and can be used separately. For example, we have a CLI and admin database commands to setup replication streams between databases, irrespective of their sharded status, so it can be used for other purposes as well, like moving tables or entire databases to new hardware. If you keep the stream(s) running, you can effectively keep up-to-date logical replicas.

      We don't currently manage DDL replication (CREATE/ALTER/DROP) for logically replicated databases - this is a known limitation that we will address shortly. After all, we don't want users to pause schema migrations during resharding. I think once that piece is in, you'll be able to run pretty much any kind of long-lived logical replicas for any purpose, including HA.

      • mosselman 31 minutes ago
        Thanks for the explanation. My use case is replacing patroni with pgdog. DDL replication would be required for this.

        Is there some way I can get updates about pgdog and especially when the replication you mentioned is there?

  • jackfischer 3 hours ago
    Congrats guys! Curious how the read write splitting is reliable in practice due to replication lag. Do you need to run the underlying cluster with synchronous replication?
    • maherbeg 1 hour ago
      The way we solved it is by checking the lsn on the primary, and then waiting for the replica to catch up to that lsn before doing reads on the replica in various scenarios.
    • levkk 3 hours ago
      Not really, replication lag is generally an accepted trade-off. Sync replication is rarely worth it, since you take a 30% performance hit on commits and add more single points of failure.

      We will add some replication lag-based routing soon. It will prioritize replicas with the lowest lag to maximize the chance of the query succeeding and remove replicas from the load balancer entirely if they have fallen far behind. Incidentally, removing query load helps them catch up, so this could be used as a "self-healing" mechanism.

      • jackfischer 3 hours ago
        It sounds like this is one of the few places that might be a leaky abstraction in that queries _might_ fail and the failure might effectively be silent?
        • levkk 3 hours ago
          It can be silent, but usually it's loud and confusing because people do something like this (Rails example):

              user = User.create(email: "test@test.com")
              SendWelcomeEmail.perform_later(user.id)
          
          And the job code fetches the row like so:

              user = User.find(id)
          
          This blows up because `find` throws an error if the record isn't there. Job queues typically use replicas for reads. This is a common gotcha: code that runs async expects the data to be there after creation.

          There can be others, of course, especially in fintech where you have an atomic ledger, but people are usually pretty conscious about this and send those type of queries to the primary.

          In general though, I completely agree, this is leaky and an unsolved problem. You can have performance or accuracy, but not both, and most solutions skew towards performance and make applications handle the lack of accuracy.

  • octoclaw 4 hours ago
    The cross-shard aggregate rewriting is really nice. Transparently injecting count() for average calculations sounds straightforward but there are so many edge cases once you add GROUP BY, HAVING, subqueries, etc.

    Curious about latency overhead for the common case. On a direct-to-shard read where no rewriting happens, what's the added latency from going through PgDog vs connecting to Postgres directly? Sub-millisecond?

    • levkk 4 hours ago
      Subms typically, yeah. We measured the average latency between nodes in the same AZ (e.g., AWS availability zone) to be less than one ms, so you need to account for one extra hop and processing time by PgDog, which is typically fast.

      That being said if you don't currently use a connection pooler, you will notice some latency when adding one. It's usually table stakes for Postgres at scale since you need one anyway, but it can be surprising. This especially affects "chatty" apps - the ones that send 10+ queries to service one API request, and makes bugs like N+1s considerably worse.

      TLDR: not a free lunch, but generally acceptable at scale.

  • noleary 4 hours ago
    > If you build apps with a lot of traffic, you know the first thing to break is the database.

    Just out of curiosity, what kinds of high-traffic apps have been most interested in using PgDog? I see you guys have Coinbase and Ramp logos on your homepage -- seems like fintech is a fit?

    • levkk 3 hours ago
      We have all kinds, it's not specific to any particular sector. That's kind of the beauty for building for Postgres - everyone uses it in some capacity!

      My general advice is, once you see more than 100 connections on your database, you should consider adding a connection pooler. If your primary load exceeds 30% (CPU util), consider adding read replicas. This also applies if you want some kind of workload isolation between databases, e.g. slow/expensive analytics queries can be pushed to a replica. Vertically scaling primaries is also a fine choice, just keep that vertical limit in mind.

      Once you're a couple instance types away from the largest machine your cloud provider has, start thinking about sharding.

      • mystifyingpoi 3 hours ago
        > If your primary load exceeds 30% (CPU util), consider adding read replicas.

        I'm not an expert, but isn't this excessive? In theory you could triple the load and still have slack. I'd actually try to scale down, not up.

        • CuriouslyC 2 hours ago
          Load is highly bursty. You can autoscale application services quickly, but scaling your database up is a slower thing.
  • array_loader 1 hour ago
    (apologies for new account - NDA applies to the specifics)

    Nice surprise to see this here today. I was working on a deployment just last week.

    Unfortunately for me, I found that it crashed when doing a very specific bulk load (COPY FORMAT BINARY with array columns inside a transaction). The process loads around 200MB of array columns (in the region of 10K rows) into a variety of tables. Very early in the COPY process PgDog crashes with :

    "pgdog router error: failed to fill whole buffer"

    So it appears something is not quite right for my specific use case (COPY with array columns). I'm not familiar enough with Rust but the failed to fill whole buffer seemed to come from Rust (rather than PgDog) based on what little I could find with searches.

    I was very disappointed as it looked much simpler to get set up and running that PgPool-II (which I have had to revert to as my backup plan - I'm finding it more difficult to configured, but it does cope with the COPY command without issues).

    I would have preferred to stick with PgDog.

    • levkk 1 hour ago
      I think we may have fixed this 3 weeks ago: https://github.com/pgdogdev/pgdog/pull/744

      Might be worth another try. If not, a GitHub issue with more specifics would be great, and we'll take a look. Also, if binary encoding isn't working out, try using text - it's more compatible between Postgres versions:

          [general]
          resharding_copy_format = "text"
  • I_am_tiberius 3 hours ago
    I really hope to use the sharding feature one day.
  • cpursley 4 hours ago
    Looks great - I'd love to include it in https://postgresisenough.dev (just put in a PR: https://github.com/agoodway/postgresisenough?tab=readme-ov-f...)
    • pbreit 3 hours ago
      How well does PG work with 10-20 million (financial) records per day? Basic stuff: a few writes per, some reads, generating some analytics, etc.
      • cpursley 1 hour ago
        The entire point of just using Postgres went right over your head…
    • nebezb 4 hours ago
      While the lift to add to your database is low, I don’t think you’re at a point you can outsource the work.

      But all the better if they do!

    • aram99 4 hours ago
      .
    • verdverm 4 hours ago
      Why don't you just do it yourself if you maintain a curated resource list?
      • cpursley 3 hours ago
        Wanted to give them chance to write it up as they like