FracturedJson

(github.com)

331 points | by PretzelFisch 5 hours ago

22 comments

simonw 3 hours ago
It looks like there are two maintained implementations of this at the moment - one in C# https://github.com/j-brooke/FracturedJson/wiki/.NET-Library and another in TypeScript/JavaScript https://github.com/j-brooke/FracturedJsonJs. They each have their own test suite.
There's an older pure Python version but it's no longer maintained - the author of that recently replaced it with a Python library wrapping the C# code.
This looks to me like the perfect opportunity for a language-independent conformance suite - a set of tests defined as data files that can be shared across multiple implementations.
This would not only guarantee that the existing C# and TypeScript implementations behaved exactly the same way, but would also make it much easier to build and then maintain more implementations across other languages.
Interestingly the now-deprecated Python library does actually use a data-driven test suite in the kind of shape I'm describing: https://github.com/masaccio/compact-json/tree/main/tests/dat...
That new Python library is https://pypi.org/project/fractured-json/ but it's a wrapper around the C# library and says "You must install a valid .NET runtime" - that makes it mostly a non-starter as a dependency for other Python projects because it breaks the ability to "pip install" them without a significant extra step.
[-]
- odyssey7 2 hours ago
  This is a good idea, though I don’t think it would guarantee program equivalence beyond the test cases.
  [-]
  - simonw 2 hours ago
    Depends on how comprehensive the test suite is.
    And OK it's not equivalent to a formal proof, but passing 1,000+ tests that cover every aspect of the specification is pretty close from a practical perspective, especially for a visual formatting tool.
    [-]
    - boxed 2 hours ago
      With mutation testing you can guarantee that all the behavior in the code is tested.
      [-]
      - odyssey7 1 hour ago
        UC Berkeley: “Top-level functional equivalence requires that, for any possible set of inputs x, the two pieces of code produce the same output. … testing, or input-output (I/O) equivalence, is the default correctness metric used by the community. … It is infeasible to guarantee full top-level functional equivalence (i.e., equivalence for any value of x) with testing since this would require testing on a number of inputs so large as to be practically infinite.”
        https://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-...
        [-]
        esrauch 1 hour ago
        In practice mutation fuzz testers are able to whitebox see where branches are in the underlying code, with a differential fuzz test under that approach its generally able to fuzz over test cases that go over all branches.
        So I think under some computer science theory case for arbitrary functions its not possible, but for the actual shape of behavior in question from this library I think its realistic that a decent corpus of 'real' examples and then differential fuzzing would give you more confidence that anyone has in nearly any program's correctness here on real Earth.
      - wizzwizz4 1 hour ago
        You can guarantee that all the cases in the code are tested. That doesn't necessarily mean that all the behaviour is tested. If two implementations use very different approaches, which happen to have different behaviour on the Mersenne primes (for deep mathematical reasons), but one of them special-cases byte values using a lookup table generated from the other, you wouldn't expect mutation testing to catch the discrepancy. Each implementation is still the local optimum as far as passing tests is concerned, and the mutation test harness wouldn't know that "disable the small integer cache" is the kind of mutation that shouldn't affect whether tests pass.
        There are only 8 32-bit Mersenne primes, 4 of which are byte-valued. Fuzzing might catch the bug, if it happened to hit one of the four other 32-bit Mersenne primes (which, in many fuzzers, is more likely than a uniform distribution would suggest), but I'm sure you can imagine situations where it wouldn't.
  - rafabulsing 2 hours ago
    Well yeah, but then any discrepancies that are found can be discussed (to decide which of the behaviors is the expected one) and then added as a test for all existing and future implementations.
kerblang 45 minutes ago
I made a silly groovy script called "mommyjson" that doesn't try to preserve JSON formatting but just focuses on giving you the parentage (thus the name) including array indexes, object names, etc., all on the same line, so that when you find something, you know exactly where it is semantically. Not gonna claim that everybody should use it or that it cures insomnia cancer & hangnails, but feel free to borrow it:
https://github.com/zaboople/bin/blob/master/mommyjson.groovy
(btw I would happily upvote a python port, since groovy is not so popular)
kstenerud 2 hours ago
This is great! The more human-readable, the better!
I've also been working in the other direction, making JSON more machine-readable:
https://github.com/kstenerud/bonjson/
It has EXACTLY the same capabilities and limitations as JSON, so it works as a drop-in replacement that's 35x faster for a machine to read and write.
No extra types. No extra features. Anything JSON can do, it can do. Anything JSON can't do, it can't do.
[-]
- esrauch 47 minutes ago
  This is very interesting, though the limitations for 'security' reasons seem somewhat surprising to me compared to the claim "Anything JSON can do, it can do. Anything JSON can't do, it can't do.".
  Simplest example, "a\u0000b" is a perfectly valid and in-bounds JSON string that valid JSON data sets may have in it. Doesn't it end up falling short of 'Anything JSON can do, it can do" to refuse to serialize that string?
- kreco 1 hour ago
  Can you tell me what was the context that lead you to create this?
  Unrelated JSON experience:
  I worked on a serializer which save/load json files as well as binary file (using a common interface).
  From my own use case I found JSON to be restrictive for no benefit (because I don't use it in a Javascript ecosystem)
  So I change the json format into something way more lax (optional comma, optional colon, optional quotes, multi line string, comments).
  I wish we would stop pretending JSON to be a good human-readable format outside of where it make sense and we would have a standard alternative for those non-json-centric case.
  I know a lot of format already exists but none really took off so far.
  [-]
  - kstenerud 1 hour ago
    Basically, for better or worse JSON is here to stay. It exists in all standard libraries. Swift's codec system revolves around it (it only handles types that are compatible with JSON).
    It sucks, but we're stuck with JSON. So the idea here is to make it suck a little less by stopping all this insane text processing for data that never ever meets a human directly.
    The progression I envisage is:
    1. Dev reaches for JSON because it's easy and ubiquitous.
    2. Dev switches to BONJSON because it's more efficient and requires no changes to their code other than changing the codec library.
    3. Dev switches to a sane format after the complexity of their app reaches a certain level where a substantial code change is warranted.
    [-]
    - kreco 45 minutes ago
      Thanks for the details!
- imiric 2 hours ago
  That's neat, but I'm much more intrigued by your Concise Encoding project[1]. I see that it only has a single Go reference implementation that hasn't been updated in 3 years. Is the project still relevant?
  Thanks for sharing your work!
  [1]: https://concise-encoding.org/
  [-]
  - kstenerud 1 hour ago
    Thanks!
    I'm actually having second thoughts with Concise Encoding. It's gotten very big with all the features it has, which makes it less likely to be adopted (people don't like new things).
    I've been toying around with a less ambitious format called ORB: https://github.com/kstenerud/orb
    It's essentially an extension of BONJSON (so it can read BONJSON documents natively) that adds extra types and features.
    I'm still trying to decide what types will actually be of use in the real world... CE's graph type is cool, but if nobody uses it...
polshaw 4 hours ago
Is there an option for it to read the contents from a pipe? that's by far my biggest use for the jq app.
[-]
- ruuda 1 hour ago
  RCL (https://github.com/ruuda/rcl) pretty-prints its output by default. Pipe to `rcl e` to pretty-print RCL (which has slightly lighter key-value syntax, good if you only want to inspect it), while `rcl je` produces json output.
  It doesn’t align tables like FracturedJson, but it does format values on a single line where possible. The pretty printer is based on the classic A Prettier Printer by Philip Wadler; the algorithm is quite elegant. Any value will be formatted wide if it fits the target width, otherwise tall.
- simonw 3 hours ago
  There's a C# CLI app in the repo: https://github.com/j-brooke/FracturedJson/blob/main/Fracture...
```
  Output is to standard out, or a file specified by the --outfile switch.  Input is from either standard in, or from a file if using the --file switch
```
  It looks like both the JavaScript version and the new Python C# wrapper have equivalent CLI tools as well.
- pimlottc 2 hours ago
  You can (usually) specify the input file name as “-“ (single hyphen) to read from stdin
- tuetuopay 3 hours ago
  this would be amazing to be chained with jq, that was my first thought as well.
CamouflagedKiwi 46 minutes ago
I really like this, I think I'd find it useful fairly often and I like the idea of just making something that I use irregularly but not that rarely a bit better.
But then I found it's in C#. And apparently the CLI app isn't even published any more (apparently nobody wanted it? Surprises me but ok). Anyway, I don't think I want this enough to install .NET to get it, so that's that. But I'd have liked a version in Go or Rust or whatever.
[-]
- fcoury 25 minutes ago
  I really liked the idea, so I am porting it to Rust https://github.com/fcoury/fracturedjson-rs
- neonsunset 9 minutes ago
  [dead]
miguelbemartin 1 hour ago
Is JSON a format that needs improvement for human readability? I think there are much better ways to present data to users, and JSON is a format that should be used to transfer data from system to system.
[-]
- btown 52 minutes ago
  If you're reaching for a tool like this, it's because you don't have a well-defined schema and corresponding dedicated visualization; you're looking at some arbitrary internal or insufficiently-documented transfer-level data with nested structure, perhaps in the midst of a debug breakpoint, and need a quick and concise visualization without the ability (or time) to add substantial code into the running runtime. Especially if you're working on integration code with third parties, it's common to come across this situation daily.
- setr 32 minutes ago
  If you discard the human-readability component of it, JSON is an incredibly inefficient choice of encoding. Other than its ubiquity, you should only be using JSON because it’s both human and machine readable (and being human-readable is mainly valuable for debugging)
- CamouflagedKiwi 54 minutes ago
  I think yes? I fairly often find that I have something in JSON, which probably is from some system to system comms, and I'm trying to read it. Once it's not trivially small I often pipe it through jq or python -m json.tool or whatever, I like the idea of something that just does a better job of that.
__MatrixMan__ 1 hour ago
When I want something more readable than json I usually use nushell. The syntax is almost the same and you can just pipe through "from json" and "to json" to convert: https://gist.github.com/MatrixManAtYrService/9d25fddc15b2494...
What I like about fractured json is the middle ground between too-sparse pretty printing, and too-compact non-pretty printing, nu doesn't give me that by default.
One thing that neither fractured json nor nushell gives me, which I'd like, is the ability to associate an annotation with a particular datum, convert to json, convert back to the first language, and have that comment still be attached to that datum. Of course the intermediate json would need to have some extra fields to carry the annotations, which would be fine.
[-]
frizlab 4 hours ago
This is interesting. I’d very much like to see a code formatter do that kind of thing; currently formatters are pretty much inflexible, which makes getting structure out of a formatted code sometimes hard.
[-]
- thechao 3 hours ago
  I just built a C++ formatter that does this (owned by my employee, unfortunately). There's really only two formatting objects: tab-aligned tables, and single line rows. Both objects also support a right-floating column/tab aligned "//" comment.
  Both objects desugar to a sequence of segments (lines).
  The result is that you can freely mix expression/assignment blocks & statements. Things like switch-case blocks & macro tables are suddenly trivial to format in 2d.
  Because comments are handled as right floating, all comments nicely align.
  I vibe coded the base layer in an hour. I'm using with autogenerated code, so output is manually coded based on my input. The tricky bit would be "discovering" tables & block. I'd jus use a combo of an LSP and direct observation of sequential statements.
  [-]
  - marxisttemp 2 hours ago
    You built it, but your employee owns it? That sounds highly unusual.
    [-]
    - jtbayly 1 hour ago
      Probably a single-letter typo. Makes complete sense if changed to “employer.”
    - pona-a 1 hour ago
      Auto corrected employer?
- barishnamazov 4 hours ago
  Right. In my previous work, I wrote a custom XML formatter for making it look table-like which was our use case. Of course, an ideal solution would have been to move away from XML, but can't run away from legacy.
barishnamazov 4 hours ago
This is pretty cool, but I hope it isn't used for human-readable config files. TOML/YAML are better options for that. Git diff also can be tricky with realignment, etc.
I can see potential usefulness of this is in debug mode APIs, where somehow comments are sent as well and are rendered nicely. Especially useful in game dev jsons.
[-]
- actionfromafar 3 hours ago
  Yaml is the worst. Humans and LLMs alike get it wrong. I used to laugh at XML but Yaml made me look at XML wistfully.
  Yaml - just say Norway
  [-]
  - airstrike 3 hours ago
    The Norway issue is a bit blown out of proportion seeing as the country should really be a string `"no"` rather than the `no` value
    [-]
    - marxisttemp 2 hours ago
      YAML strings should really require delimiters rather than being context-dependent.
    - actionfromafar 2 hours ago
      Yeah, but it's a fun slogan. My real peeve is constantly getting the spaces wrong and no tooling to compensete for its warts. If there were linters and test frameworks and unit tests etc for yaml, I'd just sigh and move on. But current situation is, for instance in ADO Yaml: "So it's time to cut a release and time is short - we have a surprise for you! This will make some condition go true which triggers something not tested up till now, you will now randomly commit shit on the release branch until it builds again."
      Stuff that would have been structurally impossible in XML will happen in yaml. And I don't even like XML.
- silvestrov 3 hours ago
  Just say Norway to YAML.
  [-]
  - merelysounds 3 hours ago
    This is a reference to YAML parsing the two letter ISO country code for Norway:
```
    country: no
```
    As equivalent to a boolean falsy value:
```
    country: false
```
    It is a relatively common source of problems. One solution is to escape the value:
```
    country: “no”
```
    More context: https://www.bram.us/2022/01/11/yaml-the-norway-problem/
    [-]
    - Y-bar 3 hours ago
      We stopped having this problem over ten years ago when spec 1.1 was implemented. Why are people still harking on about it?
      [-]
      - Etheryte 2 hours ago
        Because there's a metric ton of software out there that was built once upon a time and then that bit was never updated. I've seen this issue out in the wild across more industries than I can count.
      - actionfromafar 3 hours ago
        Now add brackets and end-tags, I'll reconsider. ;)
        [-]
        Y-bar 2 hours ago
        Brackets works fine:
        Roles: [editor, product_manager]
        End tags, that I’m not sure what that is. But three dashes is part of the spec to delineate sections:
        something: setting: true --- another: thing: false
      - quotemstr 1 hour ago
        Because once a technology develops a reputation for having a problem it's practically impossible to rehabilitate it.
kayhantolga 2 hours ago
These JSON files are actually readable, congrats. I’m wondering whether this could be handled via an additional attached file instead. For example, I could have mycomplexdata.json and an accompanying mycomplexdata.jsonfranc. When the file is opened in the IDE, the IDE would merge the two automatically.
That way, the original JSON file stays clean and isn’t polluted with extra data.
magius18 1 hour ago
All this work and there's no mention of YAML on the repository is kind of funny to me
[-]
- __MatrixMan__ 38 minutes ago
  The trouble with yaml is that it's too hard to keep track of how indented something is if its parent is off the screen. I have to keep a t-square on my desk and hang it from the top of my monitor whenever this comes up.
  That, and the fact that it has enough bells and whistles to that there are yaml parser exploits out there.
whoamii 1 hour ago
Love the spirit, but the attack-plans example IMO looks worse with this formatting. I don’t love the horizontal scrolling through properties of an object.
[-]
- ramraj07 1 hour ago
  I dont know if you spend a fraction of your life scrolling vertically through megabyte sizes json files, but if something can reduce the height of the file thats welcome. We dont need to read every single line fro left to right, we just need to quickly browse through the entire file. If a line in this format is longer than fits the screen, its likely we dont need to know whats in the cut off right corner anyway.
  [-]
  - whoamii 18 minutes ago
    Gigabytes even (people do the silliest things). But ‘find’ gets me there 90% of the time, and at that point the amount of vertical scrolling isn’t really any different than in a 2kb file.
andix 1 hour ago
Let's implement this formatting in all code editors and replace YAML with it :)
tracker1 2 hours ago
Nice... I like using JSON to stdout for logging, this would be a nice formatting option when doing local dev to prettify it without full decomposition.
NooneAtAll3 23 minutes ago
I prefer JSON5
shiandow 3 hours ago
This looks very readable. The one example I didn't like is the expanded one where it expanded all but 1 of the elements. I feel like that should be an all or norhing thing, but there's bound to be edge cases.
vitaelabitur 2 hours ago
I tokenized these and they seem to use around 20% less tokens than the original JSONs. Which makes me think a schema like this might optimize latency and costs in constrained LLM decoding.
I know that LLMs are very familiar with JSON, and choosing uncommon schemas just to reduce tokens hurts semantic performance. But a schema that is sufficiently JSON-like probably won't disrupt model path/patterns that much and prevent unintended bias.
[-]
- nurumaik 2 hours ago
  Minified json would use even less tokens
  [-]
  - vitaelabitur 1 hour ago
    Yeah, but I tried switching to minified JSON on a semantic labelling task and saw a ~5% accuracy drop.
    I suspect this happened because most of the pre-training corpus was pretty-printed JSON, and the LLM was forced to derail from likely path and also lost all "visual cues" of nesting depth.
    This might happen here too, but maybe to a lesser extent. Anyways, I'll stop building castles in the air now and try it sometime.
    [-]
    - memoriuaysj 36 minutes ago
      if you really care about structured output switch to XML. much better results, which is why all AI providers tend to use pseudo-xml in their system prompts and tool definitions
damnitbuilds 4 hours ago
Nice.
And BTW, thanks for supporting comments - the reason given for keeping comments out of standard Json is silly ( "they would be used for parsing directives" ).
[-]
- Xymist 4 hours ago
  It's a pretty sensible policy, really. Corollary to Hyrum's Law - do not permit your API to have any behaviours, useful or otherwise, which someone might depend on but which aren't part of your design goals. For programmers in particular, who are sodding munchkins and cannot be trusted not to do something clever but unintended just because it solves a problem for them, that means aggressively hamstringing everything.
  A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
  [-]
  - mystifyingpoi 3 hours ago
    > A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
    While I admire his design goals, people will just work around it in a pinch by adding a "comment" or "_comment" or "_comment_${random_uuid}", simply because they want to do the job they need.
    If your screwdriver bends like a rubber when prying, damn it, I'll just put a screw next to it, so it thinks it is used for driving screws and thus behaves correctly.
    [-]
    - pixl97 2 hours ago
      And we wonder why people are calling for licensed professional software engineers.
  - nodja 3 hours ago
    On one hand, it has made json more ubiquitous due to it's frozen state. On another hand, it forces everyone to move to something else and fragments progress. It would be much easier for people to move to json 2.0 rather than having hundreds of json + x standards. Everyone is just reinventing json with their own little twist that I feel sad that we haven't standardized to a single solution that doesn't go super crazy like xml.
    I don't disagree with the choice, but seeing how things turned out I can't just help but look at the greener grass on the other side.
  - libria 3 hours ago
    > A flathead screwdriver should bend like rubber if someone tries to use it as a prybar.
    Better not let me near your JSON files then. I pound in wall anchors with the bottom of my drill if my hammer is not within arms reach.
  - speed_spread 2 hours ago
    JSON is used as config files and static resources all the time. These type of files really need comments. Preventing comments in JSON is punishing the wide majority to prevent a small minority from doing something stupid. But stupid gonna stupid, it's just condescending from Mister JSON to think he can do anything about it.
- patates 4 hours ago
  XML people were doing crazy things in the Java/.NET world and "<!--[if IE 6]>" was still a thing in HTML when JSON was being designed.
  I also would have wanted comments, but I see why Crockford must have been skeptical. He just didn't want JSON to be the next XML.
- frizlab 4 hours ago
  Unrelated: why spaces inside the parentheses? It’s not the first time I see this, but this is incorrect!
  [-]
  - cromulent 3 hours ago
    JSON doesn't have parentheses, but it does have braces and brackets. The JSON spec specifically allows spaces.
    > Insignificant whitespace is allowed before or after any token.
    [-]
    - frizlab 3 hours ago
      I was talking about the parent comment, which has spaces inside the parenthesis (I do prefer no spaces inside brackets and braces in my JSONs, but that’s another story).
DJBunnies 3 hours ago
While I wish JSON formally supported comments, it seems more sensible (compatible) to just nest them inside of a keyed list or object as strings.
```
  {
    foo: "bar",
    ans: 42,
    comments: {
      ans: "Douglas Adams"
    }
  }
```
[-]
- NooneAtAll3 23 minutes ago
  idk... "ans: 42 // an old reference from DA API" seems easier to read than wasting 4 lines of yours
  multiply that for a long file... it takes a toll
  ---
  also sometimes one field contains a lot of separate data (because it's straight up easier to deserialize into a single std::vector and then do stuff) - so you need comments between data points
- Etheryte 2 hours ago
  Works right up until you get an entity where the field `comments` is suddenly relevant and then you need to go change everything everywhere. Much better to use the right tool for the job, if you want JSONC, be explicit and use JSONC.
  [-]
  - DJBunnies 2 hours ago
    Surely it could be suffixed or keyed with a less likely collision target than this very simplistic example. I suppose JSONC and similar exist, although they are rarely used in the wild in contrast to actual JSON usage, compatibility is important.
  - vunderba 1 hour ago
    Hadn't heard of JSONC, but I've always been a proponent of JSON5 for this reason.
    https://github.com/json5/json5
- ljm 2 hours ago
  Personally, I think if your JSON needs comments then it's probably for config or something the user is expected to edit themselves, and at that point you have better options than plain JSON and adding commentary to the actual payload.
  If it's purely for machine consumption then I suspect you might be describing a schema and there are also tools for that.
londons_explore 3 hours ago
Great. Now integrate this into every JSON library and tool so I get to see it's output more often
[-]
- tomtomtom777 3 hours ago
  I think integration into jq would be both powerful and sufficient.
  [-]
  - hnlmorg 2 hours ago
    Powerful but not sufficient. There’s plenty of us who don’t use jq for various reasons.
    [-]
    - jcims 1 hour ago
      LLMs have allowed me to start using jq for more than pretty printing JSON.