this post was submitted on 24 Jan 2026

58 points (92.6% liked)

Programming

25988 readers

143 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Follow the programming.dev instance rules
Keep content related to programming in some way
If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities !webdev@programming.dev

founded 2 years ago

MODERATORS

snowe@programming.dev

Ategon@programming.dev

UlrikHD@programming.dev

bugsmith@programming.dev

Spyro@programming.dev

The lost art of XML — mmagueta (marcosmagueta.com)

submitted 1 month ago by Kissaki@programming.dev to c/programming@programming.dev

96 comments fedilink hide all child comments

There exists a peculiar amnesia in software engineering regarding XML. Mention it in most circles and you will receive knowing smiles, dismissive waves, the sort of patronizing acknowledgment reserved for technologies deemed passé. "Oh, XML," they say, as if the very syllables carry the weight of obsolescence. "We use JSON now. Much cleaner."

top 50 comments

sorted by: hot top controversial new old

[–] calliope@retrolemmy.com 33 points 1 month ago* (last edited 1 month ago) (2 children)

There exists a peculiar amnesia in software engineering regarding XML

That’s for sure. But not in the way the author means.

There exists a pattern in software development where people who weren’t around when the debate was actually happening write another theory-based article rehashing old debates like they’re saying something new. Every ten years or so!

The amnesia is coming from inside the article.

[XML] was abandoned because JavaScript won. The browser won.

This comes across as remarkably naive to me. JavaScript and the browser didn’t “win” in this case.

JSON is just vastly simpler to read and reason about for every purpose other than configuration files that are being parsed by someone else. Yaml is even more human-readable and easier to parse for most configuration uses… which is why people writing the configuration parser would rather use it than XML.

Libraries to parse XML were/are extremely complex, by definition. Schemas work great as long as you’re not constantly changing them! Which, unfortunately, happens a lot in projects that are earlier in development.

Switching to JSON for data reduced frustration during development by a massive amount. Since most development isn’t building on defined schemas, the supposed massive benefits of XML were nonexistent in practice.

Even for configuration, the amount of “boilerplate” in XML is atrocious and there are (slightly) better things to use. Everyone used XML for configuration for Java twenty years ago, which was one of the popular backend languages (this author foolishly complains about Java too). I still dread the massive XML configuration files of past Java. Yaml is confusing in other ways, but XML is awful to work on and parse with any regularity.

I used XML extensively back when everyone writing asynchronous web requests was debating between using the two (in “AJAX”, the X stands for XML).

Once people started using JSON for data, they never went back to XML.

Syntax highlighting only works in your editor, and even then it doesn’t help that much if you have a lot of data (like configuration files for large applications). Browsers could even display JSON with syntax highlighting in the browser, for obvious reasons — JSON is vastly simpler and easier to parse.

[–] Kissaki@programming.dev 7 points 1 month ago* (last edited 1 month ago) (1 children)

Making XML schemas work was often a hassle. You have a schema ID, and sometimes you can open or load the schema through that URL. Other times, it serves only as an identifier and your tooling/IDE must support ID to local xsd file mappings that you configure.

Every time it didn't immediately work, you'd think: Man, why don't they publish the schema under that public URL.

[–] calliope@retrolemmy.com 4 points 1 month ago* (last edited 1 month ago)

This seriously sounds like a nightmare.

It’s giving me Eclipse IDE flashbacks where it seemed so complicated to configure I just hoped it didn’t break. There were a lot of those, actually.

[–] tyler@programming.dev 3 points 1 month ago

God, fucking camel and hibernate xml were the worst. And I was working with that not even 15 years ago!

[–] Ephera@lemmy.ml 25 points 1 month ago (8 children)

IMHO one of the fundamental problems with XML for data serialization is illustrated in the article:

(person (name "Alice") (age 30))
[is serialized as]
<person>
  <name>Alice</name>
  <age>30</age>
</person>
Or with attributes:
<person name="Alice" age="30" />

The same data can be portrayed in two different ways. Whenever you serialize or deserialize data, you need to decide whether to read/write values from/to child nodes or attributes.

That's because XML is a markup language. It's great for typing up documents, e.g. to describe a user interface. It was not designed for taking programmatic data and serializing that out.

[–] Feyd@programming.dev 7 points 1 month ago (1 children)

JSON also has arrays. In XML the practice to approximate arrays is to put the index as an attribute. It's incredibly gross.

[–] Kissaki@programming.dev 4 points 1 month ago (1 children)

In XML the practice to approximate arrays is to put the index as an attribute. It’s incredibly gross.

I don't think I've seen that much if ever.

Typically, XML repeats tag names. Repeating keys are not possible in JSON, but are possible in XML.

<items>
  <item></item>
  <item></item>
  <item></item>
</items>

[–] Feyd@programming.dev 12 points 1 month ago* (last edited 1 month ago) (1 children)

That's correct, but the order of tags in XML is not meaningful, and if you parse then write that, it can change order according to the spec. Hence, what you put would be something like the following if it was intended to represent an array.

<items>
  <item index="1"></item>
  <item index="2"></item>
  <item index="3"></item>
</items>

[–] Kissaki@programming.dev 5 points 1 month ago* (last edited 1 month ago) (1 children)

https://www.w3.org/TR/2004/REC-xml-infoset-20040204/

[children] An ordered list of child information items, in document order.

Does this not cover it?

Do you mean if you were to follow XML standard but not XML information set standard?

[–] Feyd@programming.dev 3 points 1 month ago (2 children)

Information set isn't a description of XML documents, but a description of what you have that you can write to XML, or what you'd get when you parse XML.

This is the key part from the document you linked

The information set of an XML document is defined to be the one obtained by parsing it according to the rules of the specification whose version corresponds to that of the document.

This is also a great example of the complexity of the XML specifications. Most people do not fully understand them, which is a negative aspect for a tool.

As an aside, you can have an enforced order in XML, but you have to also use XSD so you can specify xsd:sequence, which adds complexity and precludes ordered arrays in arbitrary documents.

load more comments (2 replies)

[–] aivoton@sopuli.xyz 5 points 1 month ago* (last edited 1 month ago) (1 children)

The same data can be portrayed in two different ways.

And that is issue why? The specification decided which one you use and what do you need. For some things you consider things as attributes and for some things they are child elements.

JSON doesn't even have attributes.

[–] Ephera@lemmy.ml 7 points 1 month ago (2 children)

Alright, I haven't really looked into XML specifications so far. But I also have to say that needing a specification to consistently serialize and deserialize data isn't great either.

And yes, JSON not having attributes is what I'm saying is a good thing, at least for most data serialization use-cases, since programming languages do not typically have such attributes on their data type fields either.

load more comments (2 replies)

[–] atzanteol@sh.itjust.works 5 points 1 month ago (4 children)

This is your confusion, not an issue with XML.

Attributes tend to be "metadata". You ever write HTML? It's not confusing.

[–] Feyd@programming.dev 13 points 1 month ago* (last edited 1 month ago) (1 children)

In HTML, which things are attributes and which things are tags are part of the spec. With XML that is being used for something arbitrary, someone is making the choice every time. They might have a different opinion than you do, or even the same opinion, but make different judgments on occasion. In JSON, there are fewer choices, so fewer chances for people to be surprised by other people's choices.

load more comments (1 replies)

load more comments (3 replies)

load more comments (5 replies)

[–] epyon22@sh.itjust.works 17 points 1 month ago (1 children)

The fact that json serializes easily to basic data structures simplifies code so much. Most use cases don't need fully sematic data storage much of which you have to write the same amount of documentation about the data structures anyways. I'll give XML one thing though, schemas are nice and easy, but high barrier to entry in json.

[–] Kissaki@programming.dev 6 points 1 month ago (2 children)

Most use cases don’t need fully sematic data storage

If both sides have a shared data model it's a good base model without further needs. Anything else quickly becomes complicated because of the dynamic nature of JSON - at least if you want a robust or well-documented solution.

[–] SlurpingPus@lemmy.world 3 points 1 month ago (1 children)

If both sides have a shared data model

If the sides don't have a common understanding of the data structure, no format under the sun will help.

load more comments (1 replies)

[–] Diplomjodler3@lemmy.world 17 points 1 month ago (1 children)

It's true, though, that JSON is just better for most applications.

[–] MonkderVierte@lemmy.zip 8 points 1 month ago (21 children)

Except config files. Please don't do config files in json.

load more comments (21 replies)

[–] Feyd@programming.dev 16 points 1 month ago* (last edited 1 month ago)

Honestly, anyone pining for all the features of XML probably didn't live through the time when XML was used for everything. It was actually a fucking nightmare to account for the existence of all those features because the fact they existed meant someone could use them and feed them into your system. They were also the source of a lot of security flaws.

This article looks like it was written by someone that wasn't there, and they're calling people telling them the truth that they are liars because they think features they found in w3c schools look cool.

[–] AnitaAmandaHuginskis@lemmy.world 15 points 1 month ago* (last edited 1 month ago) (2 children)

I love XML, when it is properly utilized. Which, in most cases, it is not, unfortunately.

JSON > CSV though, I fucking hate CSV. I do not get the appeal. "It's easy to handle" -- NO, it is not. It's the "fuck you" of file "formats".

JSON is a reasonable middle ground, I'll give you that

[–] unique_hemp@discuss.tchncs.de 7 points 1 month ago (5 children)

CSV >>> JSON when dealing with large tabular data:

Can be parsed row by row
Does not repeat column names, more complicated (so slower) to parse

1 can be solved with JSONL, but 2 is unavoidable.

[–] entwine@programming.dev 3 points 1 month ago* (last edited 1 month ago) (2 children)

{
    "columns": ["id", "name", "age"],
    "rows": [
        [1, "bob", 44], [2, "alice", 7], ...
    ]
}

There ya go, problem solved without the unparseable ambiguity of CSV

Please stop using CSV.

load more comments (2 replies)

load more comments (4 replies)

[–] thingsiplay@lemmy.ml 5 points 1 month ago (1 children)

Biggest problem is, CSV is not a standardized format like JSON. For very simple cases it could be used as a database like format. But it depends on the parser and that's not ideal.

load more comments (1 replies)

[–] thingsiplay@lemmy.ml 13 points 1 month ago (1 children)

JSON is easier to parse, smaller and lighter on resources. And that is important in the web. And if you take into account all the features XML has, plus the entities it gets big, slow and complicated. Most data does not need to be self descriptive document when transferring through web. Fundementally these languages are two different kind of languages: XML is a general markup language to write documents, while JSON is a generalized data structure with support for various data types supported by programming languages.

load more comments (1 replies)

[–] lehenry@lemmy.world 8 points 1 month ago (5 children)

While I understand the critic about XPath and XSL, the fact that we have proper tools to query and tranform XML instead of the messy wat of getting specific information from JSON is also one of tge strong point of XML.

[–] Ephera@lemmy.ml 5 points 1 month ago

There is JSONPath, at least: https://en.wikipedia.org/wiki/JSONPath

[–] deadbeef79000@lemmy.nz 4 points 1 month ago

XSLT and XPath are entirely underrated. They are seriously powerful tools.

While you can approximate XSLT with a heap of coffee and a JSON parser it's harder to keep it declarative.

load more comments (3 replies)

[–] A_norny_mousse@feddit.org 5 points 1 month ago* (last edited 1 month ago) (3 children)

I never understood why people would say JSON is superior, and why XML seemed to be getting rarer, but the author explains it:

XML was not abandoned because it was inadequate; it was abandoned because JavaScript won.

I've been using it ever since I started using Linux because my favorite window manager uses it, and because of a long-running pet project that is almost just as old: first I used XML tools to parse web pages, later I switched to dedicated data providers that offered both XML and JSON formats, and stuck to what I knew.

I'm guessing that another reason devs - especially web devs - prefer JSON over XML is that the latter uses more bytes to transport the same amount of raw data. One XML file will be somewhat larger than one JSON file with the same content. That advantage is of course dwarved by all the other media and helper scripts - nay, frameworks, devs use to develop websites.

BTW, XML is very readable with syntax highlighting and easily editable if your code editor has some very basic completion for it. And it has comments!

load more comments (3 replies)

[–] pinball_wizard@lemmy.zip 5 points 1 month ago* (last edited 1 month ago) (2 children)

When you receive an XML document, you can verify its structure before you ever parse its content. This is not a luxury. This is basic engineering hygiene.

This is actually why my colleagues and I helped kill off XML.

XML APIs require extensive expertise to upgrade asynchronously (and this expertise is vanishingly rare). More typically all XML endpoints must be upgraded during the same unscheduled downtime.

JSON allows unexpected fields to be added and ignored until each participant can be upgraded, separately and asynchronously. It makes a massive difference in the resilience of the overall system.

I really really liked XML when I first adopted it, because before that I was flinging binary data across the web, which was utterly awful.

But XML for the web is exactly where it belongs - buried and forgotten.

Also, it is worth noting that JSON can be validated to satisfy that engineering impulse. The serialize/deserialize step will catch basic flaws, and then the validator simply has to be designed to know which JSON fields it should actually care about. This gets much more resilient results than XMLs brittle all-in-one shema specification system - which immediately becomes stale, and isn't actually correct for every endpoint, anyway.

The shared single schema typically described every requirement of every endpoint, not any single endpoint's actual needs. This resulted in needless brittleness, and is one reason we had such a strong push for "microservices". Microservices could each justify their own schema, and so be a bit less brittle.

That said, I would love a good standard declarative configuration JSON validator, as long as it supported custom configs at each endpoint.

load more comments (2 replies)

[–] schnurrito@discuss.tchncs.de 4 points 1 month ago

XML is best suited for storing documents, JSON for transmitting application data over networks.

SVG is an example of an excellent use of XML, it doesn't mean we should use XML for transmitting data from a backend to a frontend.

[–] erebion@news.erebion.eu 3 points 1 month ago

XMPP shows pretty well that XML can do things that cannot be done easily without it. XMPP wouldn't work nearly as well with JSON. Namespaces are a super power.

[–] phoenixz@lemmy.ca 3 points 1 month ago

I'm sure XML has its uses

I'm also sure that for 99% of the applications out there, XML is overkill and over complicated, making things slower and more error prone

Use JSON, and you'll be fine. If you really really need XML then you probably already know why

[–] entwine@programming.dev 3 points 1 month ago

I agree with everything this article said. A lot of software would work better if devs took the time to learn and appreciate XML. Many times I've found myself reinventing shit XML gives you for free.

...But at the same time, if I'm working on a developer-facing product of any kind, I know that choosing XML over JSON is going to turn a lot of people away.

load more comments