james_davis_nicoll: (Default)
[personal profile] james_davis_nicoll


10 works new to me: five fantasy, and five science fiction, of which at least three are series (if magazines count as series). I have not see that high a fraction of SF in quite a while.

Books Received April 4 — April 10

Poll #34466 Books Received April 4 — April 10
Open to: Registered Users, detailed results viewable to: All, participants: 1


Which of these look interesting?

View Answers

Demonology for Overachievers by Lily Anderson (September 2026)
0 (0.0%)

All Hail Chaos by Sarah Rees Brennan (May 2026)
0 (0.0%)

The Faith of Beasts by James S. A. Corey (April 2026)
0 (0.0%)

FIYAH Literary Magazine Issue 38 published by FIYAH Literary Magazine (April 2026)
0 (0.0%)

House Haunters by KC Jones (October 2026)
0 (0.0%)

The Last Contract of Isako by Fonda Lee (May 2026)
0 (0.0%)

A Wall Is Also a Road by Annalee Newitz (October 2026)
1 (100.0%)

There Are No Giant Crabs in This Novel: A Novel of Giant Crabs by Jason Pargin (November 2026)
1 (100.0%)

A Kiss of Crimson Ash by Anuja Varghese (May 2026)
0 (0.0%)

Teddy Bears Never Die by Cho Yeeun (May 2026)
0 (0.0%)

Some other option (see comments)
0 (0.0%)

Cats!
0 (0.0%)

[syndicated profile] jedisct1_feed

Posted by Frank Denis (Jedi/Sector One)

People love configurable software.

They say flexibility is always good. More flags, more knobs, more environment variables, more ways to make the software fit every possible use case.

But in practice, configuration flags are often just a polite way to ship uncertainty.

A feature is added, but no one is completely sure it should be enabled by default. So, it gets a flag.

A behavior is changed, but backward compatibility is scary. So, it gets a flag.

Two users want opposite things. So, both paths stay in the code forever, behind a flag. At first sight, this looks reasonable.

Of course, a flag can be useful. Experimental features need a way to be tested. Migrations sometimes need a temporary escape hatch. And some software is genuinely used in environments that are different enough to justify a couple switches.

But temporary flags are rarely temporary. Once a flag exists, it starts attracting dependencies.

Documentation has to mention it. Support has to ask whether it is enabled. Bug reports have to include it. Tests need to cover both states. New features have to decide which side they are compatible with. And if the flag affects a file format, a protocol, or anything persisted, removing it later becomes painful. That is the real cost.

The code behind a flag is not one feature. It is two possible worlds that the maintainers now have to keep alive.

This gets worse when flags are not independent. One flag changes timeouts. Another one changes buffering. Another one changes concurrency. Individually, each sounds harmless. Together, they create a configuration space no one has actually tested.

Users then report that “it doesn’t work” on a setup that technically should be supported, but only with FAST_MODE=0, LEGACY_IO=1, the old parser, and a kernel old enough to vote.

Nobody designed that combination. It just happened. And now it is your problem.

A lot of software teams treat flags as free because adding a boolean looks cheap. It isn’t.

A boolean in the interface usually means a branching factor in maintenance.

This is especially obvious in open source. If a user asks for a niche feature, adding a flag feels like a compromise. The maintainer doesn’t have to bless the behavior as the new normal, and the user gets what they want.

But what actually happened is that the maintainer accepted long-term responsibility for behavior they may never use themselves.

The contributor will disappear. The flag will stay.

And five years later, some poor soul will ask why --compat-relaxed-fsync cannot be combined with the new backend on FreeBSD. Because software has memory.

The scary part is that flags often hide design problems.

If users regularly need a flag to disable a subsystem, maybe that subsystem is too eager.

If performance requires half a dozen tuning variables, maybe the defaults are bad or the architecture is brittle.

If a migration needs three generations of compatibility toggles, maybe the old behavior was never clearly isolated from the new one.

Flags can solve real problems. But they can also keep bad design alive by preventing the moment when someone has to say: this behavior was wrong, and it has to go.

Sure, removing options can upset people. But keeping everything forever quietly upsets the maintainers instead.

That cost is less visible, because it shows up as hesitation, slower releases, defensive coding, weird bugs, and documentation that reads like legal terms.

So, should software have no flags at all? Obviously not.

But flags should have the same status as debt: sometimes necessary, never free, and always suspicious.

Every new flag should come with an expiration story.

Why does it exist? Who needs it? What breaks if it goes away? When will that be acceptable? If nobody can answer these questions, the flag is probably not a feature.

It’s a fossil in progress.

Photo cross-post

2026-04-11 02:46
andrewducker: (Default)
[personal profile] andrewducker


Sophia likes sharing the car boot with the dogs.
Original is here on Pixelfed.scot.

[syndicated profile] firedrake_feed

2021 historical mystery; ninth in Huber's Lady Darby series (post-Regency amateur detection). As the cholera epidemic is starting to fade in Edinburgh and Lady Darby waits for her child to be born, a wildly popular rogue's biography in the Jack Sheppard mould rakes up old scandals…

I Just Want Simple S3

2026-04-10 17:14
[syndicated profile] feld_feed

Posted by feld

I just want S3. My needs are pretty basic. I don't need to scale out. I don't need replication. I just need something that can do S3 and is reliable and not slow.

Minio is dead, they pulled the plug after axing the interface. They archived the repo so they can chase AI industry dollars because those folks have heavily utilized Minio. Good for them, but I always wrote them off after pointing out a bug their tests weren't catching because they were mocking responses, not actually testing the code, and they shrugged it off. (deletes were broken at one point)

Garage is new and interesting being built in Rust, but last I tried (6 months ago?) it was also unnecessarily complex. It's very young, development was paused for a while (funding), and was missing a couple normal S3 features I wanted last I checked. Maybe it's better now. Still feels too heavy.

SeaweedFS is honestly super cool. I like their approach and how they added layers on it so they can do things like support WebDAV. I don't know what's wrong, but I run it with a master and volume node, it's slow. I switch to the new weed mini approach -- still slow. I'm storing a couple GBs of normal files in here, it's nothing fancy, but even on my own LAN I try to download a file and it chugs and chugs -- starts downloading at maybe a few hundred KB/s, eventually ramps up to ~10mbit/s. I don't know why. It's on my LAN, why isn't it instant?

CEPH is a monster. It's huge. We have it at work. Way more complex than I need, but if you really wanted to build something that can compete with Amazon's S3 you probably want CEPH. SeaweedFS is close to being capable as well...

Versity GW -- oh bless you, Reddit commenter, for bringing this one up in a thread about benchmarking S3 backend performance. Nobody seems to know it exists except... Sandia National Labs, Los Alamos National Lab, the military, bunch of universities...!

The Versity S3 Gateway currently supports generic POSIX file backend storage and Versity’s ScoutFS filesystem.

I don't know what the main use case is for Versity. It can proxy to other S3 backends so you don't have to expose auth or can provide a custom auth layer. Does it do read-through caching? Don't know, didn't check. Can you have some local buckets and some that are proxied? Don't know, didn't check.

But what it can do is just use the local filesystem for S3 storage and that's good enough for me. And I get a web interface to manage it all, it can do anonymous/public read buckets and handles policies. Interesting is that it uses xattrs for storing metadata of the objects.

So I drop this in, rclone my data over, do some testing, and I get lightning fast performance on my LAN like I expect -- line rate downloads! Finally, sanity is restored.

Now I just wait to replace this with a true ZFS native object storage that someone is working on...

[syndicated profile] feld_feed

Posted by feld

Imaginary Timeline

1984: West German intelligence sources claim that Iran’s production of a bomb "is entering its final stages." US Senator Alan Cranston claims Iran is seven years away from making a weapon. (Iranian nuclear specialists had no knowledge of how to enrich uranium and had no technology with which to do it.)

1987: Iraq targeted Iranian civilians with chemical weapons. Iranian Prime Minister Mir Hossein Mousavi said in a late December 1987 speech that Iran "is capable of manufacturing chemical weapons" and added that a "special section" had been set up for "offensive chemical weapons." Mousavi refrained from saying that Iran actually had chemical weapons, and he hinted that Iran was constrained by religious considerations. Rafighdoost recalls Khomeini asking rhetorically, "If we produce chemical weapons, what is the difference between me and Saddam?". The supreme leader was unmoved by the new danger presented by the Iraqi gas attacks on civilians. "It doesn’t matter whether it is on the battlefield or in cities; we are against this," he told Rafighdoost. "It is haram [forbidden] to produce such weapons. You are only allowed to produce protection." A few days after Mousavi’s speech, a report in the London daily the Independent referred to a Khomeini fatwa against chemical weapons. Former Iranian nuclear negotiator Seyed Hossein Mousavian, now a research scholar at Princeton University, confirmed [for this article] that Khomeini’s fatwa against chemical and nuclear weapons, which accounted for the prime minister’s extraordinary statement, was indeed conveyed in the meeting with Rafighdoost.

1992: Israeli parliamentarian Benjamin Netanyahu tells the Knesset that Iran is 3 to 5 years from being able to produce a nuclear weapon.

1995: The New York Times reports that US and Israeli officials fear "Iran is much closer to producing nuclear weapons than previously thought" – less than five years away. Netanyahu claims the time frame is three to five years.

1996: Israeli Prime Minister Shimon Peres claims Iran will have nuclear weapons in four years.

1998: Former Secretary of Defense Donald Rumsfeld claims Iran could build an ICBM capable of reaching the US within five years.

1999: An Israeli military official claims that Iran will have a nuclear weapon within five years.

2001: The Israeli Minister of Defence claims that Iran will be ready to launch a nuclear weapon in less than four years.

2002: The CIA warns that the danger of nuclear weapons from Iran is higher than during the Cold War, because its missile capability has grown more quickly than expected since 2000 – putting it on par with North Korea.

2003: A high-ranking Israeli military officer tells the Knesset that Iran will have the bomb by 2005 — 17 months away.

2004: Iran began publicizing Khamenei’s fatwa against nuclear weapons.

2006: A State Department official claims that Iran may be capable of building a nuclear weapon in 16 days.

2007: U.S. National Intelligence Estimate assessed that Iran had halted its nuclear weapons program in 2003.

2008: An Israeli general tells the Cabinet that Iran is "half-way" to enriching enough uranium to build a nuclear weapon and will have a working weapon no later than the end of 2010.

2009: U.S. President Barack Obama revealed the existence of an underground enrichment facility in Fordow, near Qom. Israeli Defense Minister Ehud Barak estimates that Iran is 6-18 months away from building an operative nuclear weapon.

2010: Israeli decision-makers believe that Iran is at most 1-3 years away from being able to assemble a nuclear weapon.

2011: An IAEA report indicates that Iran could build a nuclear weapon within months.

2012: Secretary of State John Kerry: "The fatwa issued by a cleric is an extremely powerful statement about intent," but then added, "It is our need to codify it."

2013: Israeli intelligence officials claim that Iran could have the bomb by 2015 or 2016. Joint Comprehensive Plan of Action (JCPOA, "Iran Nuclear Deal") talks begin.

2016: Iran Nuclear Deal is agreed to.

2018: President Trump withdraws from the Iran Nuclear Deal.

2018-2022: The Iran Watch project documents "violations" of the 2015 accord which did not even exist anymore.

March 2025: At a Senate hearing, Gabbard said the intelligence community "continues to assess that Iran is not building a nuclear weapon and Supreme Leader Khamenei has not authorized the nuclear weapons program that he suspended in 2003."

June 2025: President Donald Trump said Iran was "weeks away" from having a nuclear bomb. The United States Air Force and Navy attacked three nuclear facilities in Iran as part of the Twelve-Day War, under the code name Operation Midnight Hammer.

July 2025: A Pentagon assessment found that Iran's nuclear program was likely set back around 2 years.

March 2026: Israel starts a war with Iran, drags the United States into it.


We're either gullible or Iranians are too stupid to build a nuclear weapon and it just keeps slipping through their fingers. Which one do you think it is?

[syndicated profile] acoup_feed

Posted by Bret Devereaux

This is the first part of a series looking at the structure of the Carthaginian army. Although Carthage has an (unfair!) reputation for being a country of “peaceful merchants who tended to avoid wars,”1 Carthage was, I will argue, without question the second greatest military power the Mediterranean produced – eclipsed only by Rome. If we do not realize this, it is merely because Carthage had the misfortune to fight Rome ‘in the first round,’ as it were.

Carthage is, in particular, the only military power that ever manages to seriously challenge Rome on an even footing, blow for blow, after the Romans completed the conquest of Italy. The Carthaginian military system pushes Rome to the very brink of defeat twice, in contrast to the Hellenistic great powers, the heirs of Alexander, none of which ever force the Romans to ‘dig deep’ into their forces. Put another way: the Romans put Alexander’s heirs to bed mobilizing against them less than a third of the military force it took for Rome to match Carthage. The Carthaginians inflicted more casualties on the Romans in a single day than all of the successor states (a label which does not include Epirus, so no Pyrrhus here; worth noting the Carthaginians beat him too) managed in pitched battle combined. And they did this more than once; I’d hazard they managed it about seven times.2

So in this series, we are going to lay out the structure of Carthage’s armies (alas, we have very little information as to the structure of their navy), because as we’ll see, the Carthaginian military system was quite complex, drawing soldiers from all over the western Mediterranean.

Now there is an a bit of organizational trickiness here: Carthage drew forces from many different places at many different times. In practice, the Carthaginian military becomes visible to us as early as 480 (with the Battle of Himera) and seems to change significantly between this period and the army visible to us in the first book of Polybius, which fights the First Punic War (254-241) and the Mercenary War (241-237). Then the Carthaginian army undergoes another substantial shift visible to us, in terms of its composition, during the Barcid Conquest of Spain (237-218) such that the Carthaginian army that fights in the Second Punic War (218-201) looks very different again. And then Carthage loses its army and so its military forces from 201 to the end of the Carthaginian state in 146 look different again.

My solution here is to structure this treatment around the largest Carthaginian mobilizations, which were those during the Second Punic War: Carthaginian numbers peaked in 215 with something on the order of 165,000 men under arms.3 We’ll work through the components of that force (operating, as it did, in multiple theaters) and for each component of it, we can then note how – as best we can tell – that specific component changed over time.

I should also note what I am not doing here: this is not a full rundown of Carthage’s military history or the Punic Wars; rather it is an outline of the components of Carthage’s land forces. I think a treatment of the Punic Wars on a similar level to our “Phalanx’s Twilight, Legion’s Triumph” series is probably worth doing, but would be a much larger and more involved series than this, because the Punic Wars are quite long conflicts with many twists and turns and often multiple simultaneous theaters. One day!

But first, as always, raising large armies of mercenaries, subject conscripts, vassal warlords and allies is expensive! If you too want to help me invade Italy with a multi-ethnic army of diverse origins in a doomed effort to stop the Roman Republic, you can help by supporting this project over at Patreon. If you want updates whenever a new post appears or want to hear my more bite-sized musings on history, security affairs and current events, you can follow me on Bluesky (@bretdevereaux.bsky.social). I am also active on Threads (bretdevereaux) and maintain a de minimis presence on Twitter (@bretdevereaux).

(Bibliography Note: Any bibliography for the lay reader looking to get to grips with Carthage likely has to begin with D. Hoyos, The Carthaginians (2010) which provides a solid foundation on understanding the Carthaginian state and society. A solid overview of Carthaginian military history is provided by J.R. Hall, Carthage at War: Punic Armies c. 814-146 (2023). For specific periods in Carthaginian military history, note J.F. Lazenby, The First Punic War: A Military History (1996), then D. Hoyos, Truceless War (2007) on the Mercenary War and D. Hoyos, Hannibal’s Dynasty (2003) on the Carthaginian conquest of Spain, before going back to J.F. Lazenby for Hannibal’s War (1978) on the Second Punic War. G. Daly, Cannae: The experience of battle in the Second Punic War (2002) has, among other things, one of the better run-downs of the composition of Hannibal’s army. On the Gauls in Carthaginian armies, note L. Baray, Les Celtes d’Hannibal (2019), alas not translated. On the Numidians, a key component of Carthage’s army, see W. Horsted, The Numidians, 300 BC – AD 300 (2021), while on the Spanish warriors who fought for Carthage, see Quesada Sanz, F.  Armas de la Antigua Iberia: De Tartesos a Numancia (2010) now available in translation as F. Quesada Sanz, Weapons, Warriors & Battles of Ancient Iberia (2023), trans. E. Clowes and P.S. Harding-Vera. You can also find what little we know about Balaerian slingers in the opening chapters of L. Keppie, Slingers and Sling Bullets in the Roman Civil Wars of the Late Republic, 90-31 BC (2023). Finally, one must note N. Pilkington, The Carthaginian Empire (2019), an often heterodox but equally sometimes persuasive reassessment of what we know of Carthage that is intensely skeptical of our literary source tradition and an essential read (for agreement and disagreement) if one is intending to get knee-deep in the scholarship.)

A Brief Chronology

First, before we get into the details, we should lay out the basic chronology of Carthaginian military history, because as we’re going to see, not only does Carthage draw upon a bunch of different sources of military manpower, those sources themselves change over time in their composition and role within the Carthaginian system.

Now we should start with some background here on the nature of Carthage and its control over its core territory in North Africa. Carthage was a Phoenician colony, founded in North Africa (in modern day Tunisia). The population was thus likely a mix of local Libyan peoples, Phoenician settlers and even other maritime peoples (Aegeans, e.g. Greeks). The Carthaginians themselves maintained a clear ideology of being Phoenicians, using a Punic language, worshiping Punic gods and making a clear connection to their mother-city of Tyre, however some modern DNA research has suggested the actual population of Phoenician colonies might have been more genetically diverse than we have generally supposed. Of course, not every resident of Carthage was likely to be a citizen and certainly the impression we get is that some Phoenician ancestry was a requirement for full citizenship.

Via Wikipedia, a decent-if-not-perfect map of Greek and Phoenician colonization. It is worth noting when looking at this map that the Etruscan were organized into states, but not united, while the Thracians, Dacians and Illyrians were non-state peoples at this time.

Carthage was hardly the only such colony in North Africa (Utica, Thapsus (in North Africa), Leptis, Leptiminus, etc. were all such colonies), but there was also a substantial local Libyan population and at least initially Carthage was subordinate to those peoples; we’re told that quite Carthage’s first few centuries after its founding (mid-eighth century) paid tribute to the locals, a relationship that inverted quite dramatically as Carthage became stronger. Carthage seems to begin projecting power overseas seriously in the mid-to-late-500s, though we cannot always see this early process as well as we’d like. By c. 500, Carthage seems to control Sardinia and the western coast of Sicily. Some sign of Carthage’s expanding control in North Africa comes when they are able to block Dorieus (a Spartan prince) from creating a Greek colony in North Africa and then shortly thereafter also destroy his effort to found a colony in western Sicily, between 515 and 510 or so. Unfortunately, we’re not really well informed at all about the armies they used to do this

Instead, Carthaginian armies first start to become really visible to us in the context of the running contest between Carthage and Syracuse for control over the rest of Sicily, which kicks off in the 480s. From the 480s to the 270s, Carthage fights a series of wars with the Greeks on Sicily, the latter generally organized around the largest and strongest Greek city there, Syracuse. There is a tendency for students to be surprised that Carthage – given its apparent power in the third century – is unable to overcome (or be overcome by) Syracuse, but it is worth remembering that Syracuse is a really big polis, on the same scale as Athens or Sparta. Recall that from 415 to 413, the Athenians throw the lion’s share of their military, at the height of their power at Syracuse and lose effectively all of it for their trouble, so Syracuse – at least when well led and organized – is a fairly major power (in as much as any power other than the Achaemenids can be major) in this period.

In any case, the first Carthaginian-Greek war in Sicily begins in the 480s and ends with the Battle of Himera in 480. They’re then back at it from 409 to 405, then again from 398 to 396, then again from 383 to 381 (?), then again from 368 to 367, then again 345 to 341 and again from 311 to 306 and then finally from 278 to 276, Pyrrhus of Epirus shows up to campaign against Carthage on behalf of the Greeks. On the one hand, at any given time in these wars, territorial control often swings wildly between Carthage and Syracuse, but on the other hand zooming out, over the long-term relatively little changes and the whole thing resembles a stalemate: Carthage controls the west of the island, Syracuse the east and the settlements in the middle either manage in the fracture-zone between the powers or submit to one or the other.

Alongside the early phases of this running warfare on Sicily, Carthage is steadily subduing the area around it in North Africa, reducing the Libyan and Phoenician settlements in what is today Tunisia to semi-autonomous subjects. Those communities remained internally self-governing, but were in practice ruled by Carthage and we’ll talk about that relationship in the next post in the series. We can’t fully see this process clearly but by c. 400, Carthage clearly seems to have control over most of its immediate surroundings. Carthage also began interacting quite early with the Numidians, the Berber peoples to the west (generally divided into two kingdoms, Massaesylii and Massylii) sometimes recruiting them and sometimes fighting them. Certainly by the start of the third century if not earlier, Carthage is the dominant power in this relationship.

The Carthaginians are also clearly active in trade in Spain, though it is unclear to what degree the Phoenician settlements there fall under Carthaginian political control and when.

Thus even by c. 480, Carthage is one of the major imperial powers in the western Mediterranean, though hardly the only ‘major player’ and remains so, steadily growing in size and influence over the next several centuries. By c. 300, the Carthaginians have secured control over western Sicily, Corsica and Sardinia, have some small footholds in Spain and most importantly have secured control over most of what is today Tunisia (what the Romans would just call ‘Africa’) and have a dominant if frequently shifting position relative to the Numidians.

That set the stage for the major wars of the third century. Carthage was in a strong position in Sicily after the end of their war with Agathocles (in 306), leading to the Sicilians to appeal to Pyrrhus in the 270s. Pyrrhus, arriving in 278 was able to win significant victories and pin the Carthaginians back to their last major coastal base in Lilybaeum, but was unable to take it (being unable to break Carthaginian naval control) and subsequently forced out in 276 once his support among the Sicilian Greeks ebbed, suffering a nasty naval defeat on his way out for his trouble.

That left Carthage in a dominant position in Sicily (but still facing a potent foe in Syracuse) when in 264 a group of mercenaries (the Mamertines) leftover from Agathocles’ war who had seized Messina – under pressure from Syracuse – appealed to both Rome and Carthage for help. That led to a four(-ish) way war in which two of the sides (the Mamertines and Syracuse) rapidly found themselves rendered irrelevant. The result was the First Punic War (264-241) between Rome and Carthage, fundamentally a war for control over Sicily, although the Romans did invade North Africa (unsuccessfully) in 256.

Via Wikipedia, a rough map of Carthage’s territorial control at the beginning of the First Punic War, though I’d argue this probably overstates Carthaginian control in Spain somewhat (New Carthage isn’t even founded yet!)

Carthage loses the war, with Rome consolidating control over Sicily, only to be immediately beset by a new war, the Mercenary War (241-237), when a mutiny by Carthage’s unpaid mercenaries from the end of the First Punic War set off a general revolt of its subjects in North Africa. The Carthaginians win this war, particularly with the leadership of Hamilcar Barca, who is then too politically influential to be left in Carthage, so he is packed off with an army to go do stuff in Spain. The ‘stuff’ he does in Spain from 237 to his death in 228 is to subdue nearly the entire Mediterranean coast up to the Ebro River, with that task then completed by first his son-in-law, Hasdrubal the Fair and then Hamilcar’s eldest son Hannibal.

That sets the stage for ’round two’ with Rome, the Second Punic War (218-201), an absolutely massive war waged across Italy, Spain and Africa, which represents the peak military output of either Rome or Carthage (although the First Punic War, with its massive fleets, probably roughly matches it). Utterly defeated in 201, Carthage is shorn of its overseas empire and much of its more distant African holdings, essentially reduced to ‘merely’ controlling northern Tunisia. However, rapid Carthaginian economic recovery leads Rome to instigate a third war with Carthage, the Third Punic War (149-146). Unlike the previous two wars, this is not an even contest: Carthage by this point is much smaller and weaker a power than Rome. Determined Carthaginian resistance prolongs the war, but Rome is eventually able to seize the city and destroy the Carthaginian state in 146.

Via Wikipedia, a rough map of Carthaginian control at the start of the Second Punic War. This map substantially overstates Carthaginian control of the Spanish interior, however.

Now, one thing worth noting at the end of this brief, potted history is for nearly all of this period, we have only Greek sources (Romans, writing in Latin, only really come in with the Punic Wars and even then our earliest Roman sources – Fabius Pictor – are lost, so we get him processed through a Greek – Polybius). One of the features of the history we do have of Carthage that I suspect results from this is that Carthage seems to lose a lot. But it is, at least until 264, a strange sort of losing: Carthage shows up in our sources losing major battles but then one moves forward a few decades and Carthage’s empire is larger and more prosperous. And then Carthage loses another major battle and yet somehow, a few decades later Carthage is even more powerful.

So either Carthage is the world champion at failing upwards or there is something going on with our sources. And it isn’t hard to really guess what: our key source for Carthaginian history before 264 is Diodorus Siculus, that is, ‘Diodorus the Sicilian,’ a Sicilian Greek writing in the first century B.C. who thus very obviously has a side in Carthage’s long wars with the Sicilian Greeks. Even if Diodorus is doing his best to give us a straight story, which battles are his sources likely to remember or commemorate most prominently: the Time They Really Walloped the Carthaginians or perhaps smaller engagements that they lost? Thus while we cannot know for certain, I find that I suspect Carthage’s battle-record pre-264 is likely rather better than our sources suggest.

Post-264, it seems worth noting that while Carthage loses more often than they win against the Romans, they still manage to deliver Rome some pretty stunning defeats. The notion that Carthaginians are ‘peaceful merchants’ or just ‘unmilitary’ thus seems to be almost entirely empty, a nearly pure product of later stereotypes about ‘unmanly easterners’ rather than a conclusion justified by the evidence. At the very least, by the time Rome was ready to fight Carthage, the Carthaginians very much knew how to throw a punch – indeed, they would punch Rome far harder than any other foe.

That still provides some three hundred years where Carthage is a meaningful military power where we can see their military activities, so as you might imagine, the shape of the Carthaginian army changes a lot over that period.

Component Parts

The next thing we ought to do, to get an overall sense of the system, then, is to lay out the scale of Carthaginian forces at the height of the Second Punic War, representing the largest land mobilization that Carthage ever produced. The size of the mobilization is staggering, as is the diversity of how it was raised: like most imperial powers, Carthage’s army was a diverse medley of soldiers drawn from basically everywhere that Carthaginian power reached. The way these soldiers were incorporated into Carthage’s armies was in turn a product of what their relationship to the Carthaginian state was – citizens, subjects, vassals, allies, mercenary employees.

Our sources, most particularly Polybius, provide us enough detail to get a pretty decent accounting of Carthage’s ‘peak’ mobilization, which comes in 215. Hannibal, of course, had a Carthaginian field army at that time in Italy – he had won the Battle of Cannae (216) the year before – but there were also Carthaginian armies in Spain, Sardinia and Africa, along with an active fleet. Carthage alone of the Mediterranean powers of the era seems to have been able to match Rome’s capacity for multi-theater warfare: whereas Hellenistic kingdoms could really only have one primary theater of war at a time, both Rome and Carthage could wage multiple parallel campaigns simultaneously and did so.

So let’s break down the evidence for what we have.

We can begin with Hannibal’s army in Italy, which Polybius tells us (3.114.5) consisted of 40,000 infantry and 10,000 cavalry for the Battle of Cannae (216). We can actually work backwards with just a little bit of guessing to break down this army into its unit composition: Hannibal crosses the Alps with 12,000 Africans, 8,000 Iberians, and 6,000 cavalry, taking some losses in the subsequent battles but also absorbing around 9,000 Gallic infantry and 5,000 Gallic cavalry. Figuring for attrition, the composition of Hannibal’s army at Cannae has to look at least something like around 10,000 African infantry, 6,000 Iberian infantry, around 8,000 mixed ‘lights’ (North African lonchophoroi, which means ‘javelin-men’ not ‘pikemen’ as it is sometimes mistranslated) and Balearian slingers and 16,000 Gallic infantry to make the total. Of the cavalry we might suspect around 5,000 of it was Gallic cavalry and the rest split roughly evenly between Numidian cavalry from Africa and Iberian cavalry (both of which we’re told Hannibal has).

We then need to modify that force for Hannibal’s losses at Cannae: he lost 4,000 Gauls, 1,500 Iberians and 200 cavalry, but was reinforced late in the year (Polyb. 3.117.6; Livy 23.13.7) with 4,000 more Numidian cavalry and 40 elephants. That leaves Hannibal in 215 with an army of roughly 50,000: 10,000 African infantry, 12,000 Gallic infantry, 4,500 Iberian infantry, 8,000 mixed ‘lights’ (lonchophoroi and Balearian slingers), around 5,000 Gallic cavalry and perhaps 10,000 other cavalry, of which we might guess that maybe 2/3rds were Numidian and 1/3rd Iberian.

At the same time in Italy there is a second Carthaginian army operating in Bruttium (modern Calabria; Hannibal is operating out of modern Apulia) under the command of Hanno with 17,000 infantry composed mostly of Roman socii that have defected to Hannibal, along with 1,200 cavalry, mostly Spanish and Numidian (Livy 24.15.2).

The thing is Hannibal does not have Carthage’s largest army. One of the mistakes students make in assessing the Second Punic War is focusing – as most modern treatments do – almost entirely on Hannibal. But for Carthage, getting reinforcements to Hannibal is very hard – Rome at this point has a strong navy so they can’t easily sail to Italy – but the war is also very active in Spain. Carthage had come to control the Mediterranean coast of Spain as a result of the conquests of Hamilcar Barca (we’ll discuss this more when we get to these guys in a couple of weeks) and Rome was seeking to tear that part of the empire away.

Carthage had three generals operating in Spain by 215 – Hasdrubal and Mago Barca (Hannibal’s brothers) and Hasdrubal Gisco. Livy reports the combined strength of all three at 60,000 (Livy 23.49) and once again with some careful tracking through Livy and Polybius we can basically break this force down to roughly 24,000 African infantry (a mix of Hannibal’s troops left behind and reinforcements brought by Mago), a touch less than 2,000 African cavalry, and the remainder – about 34,000 – mostly Iberian troops along with some small units of Gauls (300 from Liguria) and Balearian slingers (500). We can be fairly ‘rough’ with these numbers because we’re dealing with ‘paper strengths’ that are going to be off to some degree in any case – the point here is a rough approximation of an estimate, because our sources aren’t going to get better than that.

In addition, there was a Carthaginian army dispatched to Sardinia to try to retake it, a force Livy reports as being roughly the same size as the reinforcements Mago brought to Spain, which would mean 12,000 infantry and 1,500 cavalry, probably nearly all African (Livy 23.23.12).

Finally, Carthage maintained a force still in Africa. Hannibal had, at the war’s outset, transferred to Africa some 13,850 Iberian infantry, 870 Balearian slingers and 1,200 Iberian cavalry, while redeploying some 4,000 Metagonians (from what is today eastern Morocco) to Carthage as well.

Taking all of that together we can estimate very roughly (with some rounding) that Carthage has, under arms, in 215:

  • 50,000 African infantry
  • 17,000 Italian socii
  • 12,000 Gallic infantry
  • 52,000 Iberian infantry
  • 10,000 various ‘lights’ (including at least 1,370 Balearian slingers)
  • 21,000 cavalry of which probably roughly
    • 5,000 are Gallic cavalry
    • 5,000 are Iberian cavalry
    • 11,000 are African and Numidian cavalry (with the Numidians probably the larger share)

For a total of roughly 162,000 men under arms. Notably missing from this total are any Carthaginian citizen troops, but for reasons I’ll get to below, I do think there probably were some in North Africa. For comparison, the peak mobilizations of the major successor states (the Seleucid and Ptolemaic kingdoms) are probably around 80,000 men. Carthage is doubling that mobilization and very nearly matching Rome’s own maximum mobilization (around 185,000 men).4

Carthaginian Citizen Soldiers

Now you may have noticed something a little odd for the Carthaginian army implied by the figures above: there aren’t any Carthaginians in it. And that tends to be one of the core things that folks ‘know’ about Carthaginian armies, which is that these were ‘mercenary’ armies, where Carthaginians only served as officers. That is, after all, more or less directly what Polybius tells us and historians ancient and modern tend to take Polybius at his word. And while Polybius is being more than a little sneaky with his description of Carthaginian armies as mercenary in nature, the idea that Carthaginians didn’t serve in quantity in Carthaginian armies is at least half true, but with important geographical and chronological limitations.

Here, we are interested in the Carthaginian citizens themselves. And we begin with the first exception to the idea that Carthaginian citizens didn’t fight, the chronological one: Carthaginian citizen armies are actually very common everywhere (that is, both at home and abroad) in the fifth and fourth century. Diodorus (11.22.2) reports ‘Phoenicians’ in the Carthaginian army for the Battle of Himera (480) which are likely Carthaginian citizen soldiers we hear of Carthaginian citizen soldiers in later Carthaginian expeditions to Sicily in 409 too. As late as 339, at the Battle of the Crimissus, the Carthaginian army includes, according to Diodorus, a Sacred Band of Carthaginian citizens several thousand strong (Diod. Sic. 16.80.4) which seems to be a picked force from a larger body of Carthaginian citizens, given that he describes its members as distinguished even among the citizens for valor, reputation and wealth.

Now in most treatments the next thing that will get said is that in the third century – when both the First (264-241) and Second (218-201) Punic Wars occur – the Carthaginians changed this policy and citizens stopped serving except as officers. But I think that perhaps misses what is really happening here and the reason has to do with the perspective of our sources: we have no Carthaginian sources or even North African sources. What we have are the reports primarily of Romans (who fought Carthage), Greeks on Sicily (who fought Carthage) and mainland Greeks like Polybius, who relied on the other two. My point is not necessarily that these sources are hostile to Carthage (though they are), but rather that their focus is directed. We are seeing Carthage like one would see a statute in a dark room lit entirely from one side: only half the statute will be illuminated.

Our sources are very interested in the armies that Carthage sends against Syracuse and Rome and almost entirely uninterested – or uninformed! – about the forces that Carthage might muster in other places. We only see Carthaginian North Africa clearly in brief snippets: when a Greek or Roman tries to invade it (310, 256, 204and 149) or in the context of a major revolt like the Mercenary War (241-237) which draws our sources attention.

But what do we see whenever the action shifts to North Africa? Citizen soldiers in Carthage’s armies. While Diodorus inserts into his narrative a line about how the Carthaginians were unprepared for fighting when Agathocles (tyrant of Syracuse) lands his army in Africa in 310, they quickly manage to put together a citizen soldier army – Diodorus says of some 40,000 soldiers, but Diodorus’ numbers here are often useless (Diod. Sic. 20.10.5-6). We don’t hear anything about citizen soldiers during Rome’s unsuccessful invasion in 256 (during the First Punic War), but when Carthage’s expeditionary army (returned from Sicily at the war’s end) revolts in 241, Carthage immediately raises a citizen army to put down the revolt and succeeds in doing so (Polyb. 1.73.1-2). Likewise, when P. Cornelius Scipio soon-to-be-Africanus lands in North Africa in 204, the Carthaginians raise citizen forces (alongside all of their other troops) to try to stop him and Carthaginian citizens formed a major part of Hannibal’s army at Zama (202; Polyb. 15.11.2-4), including both infantry and cavalry.

And of course, when Rome returned for the final act in the Third Punic War (149-146), Carthage – largely shorn of its empire – responded by mobilizing a citizen force to defend the city, alongside freed slaves (App. Pun. 93-5) and resisted fairly stoutly.

In short, with the exception of M. Atillius Regulus’ invasion of 256, every time Carthaginian Africa is ‘illuminated’ for us we see Carthaginian citizen forces. Now our sources often present these forces as basically ‘scratch’ forces, raised in a panic, but while the Carthaginians sometimes lose the battles that result, these armies are not a ‘rabble’ by any means. Carthaginian citizen forces were evidently sufficient to defeat their own mercenaries and the Libyan revolt in 241. At Zama (202), the Carthaginian citizens form the second rank of Hannibal’s army and while Polybius is quick to lean into stereotypes calling them cowards (for not reinforcing the first battle line, composed of mercenary troops), in practice what he actually describes is that the Carthaginian citizen line is able to throw the Roman hastati back and is only forced to retreat by the advance of Scipio’s second line of principes (Polyb. 15.13.5-8).

My suspicion is thus that Carthaginian citizen soldiers may have never fully gone away, but rather they may have been confined largely to operations in North Africa. It makes a degree of sense that the Carthaginians might want to wage their imperial wars almost entirely with auxiliary troops recruited from their dependencies (or paid for as mercenaries), with Carthaginian citizens serving only as generals and officers, while reserving their citizen soldiers for operations closer to home. And there must have been more of such operations than we are aware of. Remember: Carthaginian armies really only become fully visible to us as they interact with Greek and Roman armies, but obviously Carthage must have accomplished the subjugation of much of North Africa, must have managed to subordinate (if not subdue) the Numidians, must have been able to hold that control through military strength (for our sources are very clear that Carthaginian control was often resented) and finally must have been able to also deter the Saharan, Berber and Lybian peoples on their borders.

In short, there is almost certainly quite a lot of Carthaginian campaigning in Africa which we can’t see clearly and it is possible that Carthaginian citizen soldiers continued to be active in these operations throughout. In that case, Carthage may well have kept its citizenry in some degree of readiness for war, which may explain why substantial bodies of Carthaginian citizen soldiers seem to be available and militarily effective so quickly when Carthage’s core territory in Africa is threatened. That said, short of some very convenient (and very unlikely) Punic inscriptions showing up, this remains merely a hypothesis; our sources offer no hint of this and indeed Polybius states the opposite, that the Carthaginian citizenry was broadly demilitarized.

Carthaginian Arms and Tactics

Of course, if Carthaginian citizens did sometimes fight, that raises a key question: how did Carthaginian citizens fight? With what arms and tactics?

The first answer is that our evidence is infuriatingly limited here. After all, Carthaginian citizen soldiers do most of their fighting visible to us relatively early where our main sources are writers like Diodorus, who – because he is writing a universal history covering everything from the earliest mythology (he includes the Fall of Troy) down to his own day (mid-first century B.C.) – rarely gives a lot of details. Normally we might supplement this with visual evidence in artwork or equipment deposited in graves, but there is very, very little of this. That point has sometimes been taken to reflect Carthage’s ‘unmilitary’ character, but it is worth noting that prior to 146, we have similarly little archaeological or representational evidence of the Roman Republic’s armies and no one accuses the Romans of being ‘unmilitary’ in character.

What evidence we do have suggests that the Carthaginians largely fought as heavy infantrymen in a manner not too different from Greek hoplites. Now I want to caveat that immediately to say this doesn’t mean they fought as hoplites – it is certainly possible but by no means necessarily or certain that the Carthaginians might have adopted weapons or tactics from the Greeks. The Levant had its own infantry traditions on which the Carthaginians might have drawn which included heavy armor and large shields. At the same time, as noted, it seems like Phoenician colonies drew in a lot of Aegean (read: Greek) settlers, so it would hardly be shocking of the Carthaginians did adopt Greek armaments.

However, I want to pause for a moment to draw one point of important clarification: at no point did any Carthaginian or any soldier in Carthaginian service that we know of, fight in a Macedonian-style pike phalanx. The idea that the Carthaginians adopted this style of fighting is based entirely on old mistranslation of lonchophoroi as ‘pikemen’ when in fact the lonche is a light spear and these are light infantry javelin-men fighting in support of African heavy infantry. We’ll talk more about them next week.

We have a few small engravings (small engraved impression seals called ‘scarabs’) from Carthage and Phoenician settlements in Sardinia, which depict soldiers and they show men with large apparently circular shields and spears.5 Numidian royal monuments, which may be drawing on Carthaginian material culture (it would have been high status) feature large round shields as a design motif and one intriguing monument, a statue base excavated in Rome, has been supposed by Ann Kuttner to possibly be a Numidian comission showing Numidian arms (or perhaps the captured arms of Carthaginians?) and shows a large round shield of the same type seen on their royal monumnets, alongside tube-and-yoke cuirasses (two of which are set up as trophies) and plumed helmets of the pilos/konos type (a kind of Hellenistic Greek helmet).6 And our literary sources regularly describe the Carthaginians as forming heavy infantry battle lines (using the word φάλαγξ, phalanx, to describe them) and report Carthaginians as wearing helmets and armor, with large shields and spears.7

Via the British Museum (inv. 127214), a fifth century Phoenician scarab showing a warrior wearing a cuirass, greaves, a helmet, a large (round?) shield and carrying a spear, found in Sardinia. While the curator’s description assumes this warrior is Greek, Carthaginian seems far more likely given the find location, art-style and equipment.

On that basis, both Gregory Daly and Joshua Hall (both op. cit.) conclude that the Carthaginians must have fought rather a lot like Greek hoplites and I think this is both basically correct and probably the best we can do. By the Punic Wars, we have hints that Carthaginian troops (both citizen and subject from North Africa) may also be adopting Italic equipment, which I’ll get into more in the next post: by the end of the Second Punic War and certainly by the Third Punic War, Carthaginian soldiers may have looked actually quite ‘Roman’ in their kit.

All of that said, as is obvious from the forces Carthage arrayed for the Punic Wars, Carthaginian armies included far more than just citizen soldiers – indeed, many Carthaginian armies evidently included few if any Carthaginian citizens outside of the officer corps. So to better understand Carthage’s armies, we are going to have to branch out to think about their other forces, which we’ll begin to do next week.

Popup video

2026-04-10 22:30
[syndicated profile] jwz_blog_feed

Posted by jwz

Last year I wrote popup-video.js, a Google-surveillance-defeating YouTube player. You may have noticed it in action on the DNA Lounge calendar pages and galleries but I've made some improvements recently.

  • The web page contains only a locally-hosted thumbnail. Nothing from YouTube is loaded until someone clicks play. This means no surveillance trackers on every one of your pages that has a video, and also the pages load dozens fewer megabytes.

  • When you click play, a fake-window pops up inside the page with a YouTube player.

  • Clicking anywhere makes it go away.

  • There's a "minimize" button that makes it drop back down into the place where the thumbnail was and continue playing.

  • If the inline thumb is nearly as wide as the page, it plays inline instead of doing the popup thing (this is often the case on mobile).

  • It works on playlists as well as single videos.

  • For single videos, it generates an ad-hoc playlist of all of the other YouTube videos linked on the current page, so the "previous" and "next" buttons show those.

  • It works on portrait videos and videos with weird aspect ratios.

Dear Lazyweb,

If you understand the ever-changing rules about auto-play, perhaps you can offer some guidance.

On desktop Safari, you have to click twice to get the first video to play: the YouTube player pops up but does not then auto-play, unless you have done "Website settings / Allow all auto-play" for jwz.org. This is -- what's the word -- fucked up, because the creation of the IFRAME and the sending of the "play" event all happen underneath the user's "click" event, so this should be considered interactive.

And on mobile Safari you always have to click twice.

(I don't remember what the situation is on Chrome or Firefox on account of not caring.)

Anyway, if you know how to make it play with one click instead of two, I would like to know how to do that. Your suggestion should take the form of, "Here's a modified version of your JS file that works". Your speculation is acknowledged and ignored.

Previously, previously, previously.

dsrtao: dsr as a LEGO minifig (Default)
[personal profile] dsrtao
Stop me if you've heard this one before: a nice enough woman in her mid-30s dies of cancer and goes to the afterlife. Whereupon she gets bored in Paradise and takes a job in Hell, starting a Hellp Desk for the more confused - or ornery - of the damned.

You might have heard it before because it originated as a series of very short videos: Hell's Belles. One actress plays multiple roles, including our protagonist Lily, her adopted daughter Sharkie, various named and unnamed demons, and a smorgasbord of unhappy souls who have generally reached the decision that Talking To The Manager is a good idea.

The book is a series of incidents and explorations around the afterlife, establishing the rules and the characters properly so that the events which transpire make proper story-telling sense. I would not have guessed this to be a first novel.

Contains spoilers for the video series or vice-versa; romantic and erotic and remarkably well-inclined to atheists for a book set in Hell and featuring Heaven, Valhalla, Paradise, Elysium... God makes an appearance but Lucifer gets better lines. I liked it.

jwz mixtape 258

2026-04-10 21:42
[syndicated profile] jwz_blog_feed

Posted by jwz

Please enjoy jwz mixtape 258.

Because of the recent unpleasantness, here's a mixtape of songs about nuclear war!

This idea popped into my head while I was standing at the bar last night and I thought, "Yeah, I probably have enough videos for that" and started writing down song names... 10 minutes later I had more than 2 hours worth.


Oh yeah, and I had to find an alternate upload of the Two Tribes video because the official one is marked "inappropriate for some users, sign in to confirm your age." "Based on community guidelines." That video played hourly on MTV from like 1984 through 1988. I hate you, Milkman Youtube.


VICTORY

2026-04-10 22:53
kaberett: Trans symbol with Swiss Army knife tools at other positions around the central circle. (Default)
[personal profile] kaberett

Behold my works: today I went to the leisure centre, and went into the leisure centre, and went into the gym, and poked around a bit, and retreated to the stairwell to hyperventilate... and then WENT TO A CHANGING SPACE and CHANGED MY CLOTHING and went back into the gym. And picked! things! up!!! and put them down again!!!!!

I have now Touched Barbell, appear to have accidentally skipped most of Phase 2 of Liftoff in favour of Barbells, Apparently, but honestly the biggest and most exciting bit of this is that I did go back into The Gym and I did push through the social anxiety of What If I'm Doing It Wrong.

It is an excellent time of year to be doing this; the cherries on the way from the gym to the bus stop are in full and exuberant flower.

[syndicated profile] schneier_no_tracking_feed

Posted by Bruce Schneier

Regulation is hard:

The South Pacific Regional Fisheries Management Organization (SPRFMO) oversees fishing across roughly 59 million square kilometers (22 million square miles) of the South Pacific high seas, trying to impose order on a region double the size of Africa, where distant-water fleets pursue species ranging from jack mackerel to jumbo flying squid. The latter dominated this year’s talks.

Fishing for jumbo flying squid (Dosidicus gigas) has expanded rapidly over the past two decades. The number of squid-jigging vessels operating in SPRFMO waters rose from 14 in 2000 to more than 500 last year, almost all of them flying the Chinese flag. Meanwhile, reported catches have fallen markedly, from more than 1 million metric tons in 2014 to about 600,000 metric tons in 2024. Scientists worry that fishing pressure is outpacing knowledge of the stock.

As usual, you can also use this squid post to talk about the security stories in the news that I haven’t covered.

Blog moderation policy.

[syndicated profile] scalziwhatever_feed

Posted by Athena Scalzi

Pets are more than just roommates we feed and scoop poop for, they’re often a source of emotional support and comfort in our complicated, lengthy lives. Author Eleanor Lerman explores the bond between furry friends and humans in her newest collection of short stories, King the Wonder Dog and Other Stories. Whether your cat is in your lap or on your keyboard, give them a pet as you read along in the Big Idea.

ELEANOR LERMAN:

Having just completed a book of poetry in which much of the work examined the concept of grief about a lost parent (and offered the idea that even Godzilla might be lonely for his mother), I was thinking about what I might write next when I saw a tv commercial that featured a group of older women. They were all beautifully dressed, had expensive haircuts that made gray hair seem like a lifestyle choice, and were laughing their way through a meal on the outdoor terrace of a restaurant. I won’t mention the product being advertised, but they discussed how happy their all were to be using it and to have the love and support of their charming older women friends, who used it too. This is one version of aging in our culture: cheerful, financially secure, medically safeguarded, and surrounded by supportive friends. In this version, the body cooperates, the future is manageable, and loneliness is nowhere in sight.

That’s one way older women—and men—are portrayed in our culture: happy as the proverbial clam and aging with painless bodies and lots of money to pay for the medical care they will likely never need. In literary fiction, however, aging men and women are often depicted in a very different setting: traveling alone through a grim country, with broken hearts and aching bodies until we leave them at the end of their stories hoping—though not entirely believing—that we will avoid such a fate ourselves.

So, what I decided to do in King the Wonder Dog and Other Stories, was to explore what is perhaps a middle ground by writing about both women and men living alone who are growing older and are confounded by what is happening to them. They still feel like their younger selves but are aware that their bodies are changing, that the possibility of once again finding love in their lives is unlikely and that loneliness has begun to haunt them like an aging ghost.

Having had pets in my life for many years—and being aware that animals, too, can feel loneliness and fear—I paired each man and woman in my stories with a lonely dog or cat and tried to work out how that relationship would ease the sadness in both their lives. One memory I drew on was how, when I was young and living alone, I had a little cat that someone had found in the street and gave to me. I had never had a pet before (other than a parakeet, which didn’t give me much to go on) and this little cat was very shy, so I didn’t quite know how to relate to her. But somehow, bit by bit, she cozied up to me, and when I was writing, she was always with me, sitting on my lap or on my feet.

I have no idea how animals conceptualize themselves and their lives, but I do know they have feelings and I hope that for the eighteen years she and I lived together, my cat felt safe and cared for. And still, today, I sometimes think about the unlikely sequence of events that brought us together: how a random person found a tiny kitten, all alone, crouched behind a garbage can, and how that random person was sort of friends with a sort of friend of mine who happened to tell me about the kitten and asked if I knew anyone who would take her and I said yes: me. I don’t know why I said yes, but I’m glad I did. Her name, by the way, was simply Gray Cat, which probably shows how unsure I was about whether I would be able to care for her well enough to at least keep her alive.

After that, I was never without a cat or dog, and now I usually have both. The little dog I have now is a sweet, happy friend who seems not to have a care in the world, but I often see her sitting on the back of my couch, staring out the window at the ocean not far beyond my window and I wonder what she thinks about what she sees. What is that vast, shifting landscape to her? And who am I? A friend who pets her and feeds her and gives her those wonderful treats she loves? Maybe she was frightened when she was separated from her mother but otherwise, I think she is having a happy life—at least I hope so. And sometimes when I walk her, I think about what will happen when she’s no longer with me and I’m even older than I am now. Could I get another dog? I have painful issues with my back that sometimes make it hard for me to walk and I certainly can’t walk any great distance—could I maybe get a dog that doesn’t need to walk too far or somehow shares my disability?

All these thoughts have gone into the stories in King the Wonder Dog, in which men and women are growing older, have illnesses, are frightened by how lonely they feel, and in one way or another—and often to their surprise—are able to bond with a dog or cat who is also in a tenuous situation. And through that bond, the people and the animals find at least a little bit of happiness in their lives, a little bit of the shared comfort that arises from one creature caring for another. I hope those who read the book will feel some of that comfort, too.


King the Wonder Dog and Other Stories: Amazon|Barnes & Noble|Books-A-Million|Bookshop

Author socials: Website|Facebook

vivdunstan: Photo of some of my books (books)
[personal profile] vivdunstan
My current main reading, on my Kindle, with utterly gargantuan font needed for disability reasons. A mix of fiction and non fiction, history, SFX magazine columns, and oh so very much Venice. I’m starting a virtual tour of Venice for a few months (self directed), and reading is part of it.

Screenshot of a Kindle Paperwhite in portrait mode, with black and white / greyscale screen. Two rows of 3 book covers are visible. On the top row are "Echolands: A Journey in search of Boudica" by Duncan Mackay, "The Glassmaker" by Tracy Chevalier, and "The SEX Column and other misprints" by David Langford (the first collection of his SFX magazine columns). On the row below are "Venice Tales: Stories selected and edited by Katia Pizzi" (with a gondola on the cover), "Restoration London: Everyday life in London 1660-1670" by Liza Picard, and "A History of Venice" by John Julius Norwich (the cover didn't download properly).
[syndicated profile] jwz_blog_feed

Posted by jwz

So what is with all of this amateurish phishing spam that has been sliding right past SpamAssassin like shit through a goose for the past few months? Has someone recently discovered a new technique for finding open relays that will SPF-sign anything?

Received: from mail.vividdreamqb.name (milestone.clevervistakb.com [170.130.167.11])
From: Dicks Rewards Team <dicksp0@vividdreamqb.name>
Subject: Final notice: YETI Beach Lounge Wagon unlocked by your gear score
Message-ID: <RUxA2tTL-gy3-hDiWXM33jfhjh3zp@vividdreamqb.name>
X-Request-ID: d36d3311-8a4b-4ac9-91b0-4afeee106923
Feedback-ID: jhkmk:vividdreamqb.name:mail
X-Spam-Report:
    * 0.0 HTML_MESSAGE BODY: HTML included in message
    * 0.4 KHOP_HELO_FCRDNS Relay HELO differs from its IP's reverse DNS

I can't even tell who has been popped here. "clevervistakb.com" and "vividdreamqb.name" have the same IPs but different registrars (maybe that's a TLD thing?) It's also not clear to me which of those domains sender_access matches on.

I have, however, come to the conclusion that that there are simply too many web sites.

Previously, previously, previously, previously, previously, previously, previously, previously, previously, previously.

[syndicated profile] lets_encrypt_feed

Have you ever needed to make sure your website has a broken certificate? While many tools exist to help run an HTTPS server with valid certificates, there aren’t tools to make sure your certificate is revoked or expired. This is not a problem most people have. Tools to help manage certificates are always focused on avoiding those problems, not creating them.

Let’s Encrypt is a Certificate Authority, and so we have unusual problems we need to solve.

One of the requirements for publicly trusted Certificate Authorities is to host websites with test certificates, some of which need to be revoked or expired. This gets messed up more than you might expect, but it’s a bit tricky to get right. Test certificate sites exist to allow developers to test their clients, so it’s important that they’re done right.

We’d previously used certbot, nginx, and some shell scripts, but the shell scripts were getting a bit too complicated. So we wrote a Go program tailored to the specific needs of a CA’s test certs site.

The websites

We need to host three sites per root certificate:

  • A valid certificate, like any other website.
  • An expired certificate, past its expiry date.
  • A revoked certificate, but it can’t be expired.

Valid is easy enough; it’s the normal case of any other website. This is a solved problem.

Expired, too, is pretty easy. Issue one certificate, wait until it expires, and then you can use it forever. Not a normal feature, but so long as your webserver doesn’t get upset at it being expired, it’s easy to set up once and leave it.

Revoked, though, is where it’s easiest to slip up. You could fail to revoke a certificate and serve a perfectly valid one, or you could let your revoked certificate expire. Making sure your website is serving a non-expired but revoked certificate is not something any of the off-the-shelf tools support.

The ingredients to bake a cake

In order to implement our program, we need a few different ingredients to mix together.

First and foremost, we need to be able to get certificates. Because we’re writing this in Go, we’re using Lego as a library to request the certificates. Obtaining a certificate requires completing a domain validation challenge. We can hook Lego up to the Go webserver we’re using to complete TLS-ALPN-01 validation. We use that challenge type because it doesn’t require any more setup beyond exposing our webserver to the internet.

To get a revoked certificate, we request a certificate and then revoke it. That’s something we can do with Lego and ACME too: The account which issued a certificate can request it be revoked. We then need a way to check that the certificate is revoked. Certificates contain an HTTP URL pointing to the Certificate Revocation List (CRL) which we poll until our certificate’s serial number appears in it.

Let’s Encrypt implements the ACME standard, which defines how clients can get certificates. In general, we think ACME clients integrated into webservers are often the best way to get certificates for websites. They can automatically handle challenges, managing and reloading certificates, and overall minimizing the amount of work and reducing problems.

We also need a way to wait until a certificate is in the right state. The valid certificate is ready to use right away, but that’s not true for the revoked and expired certificates. The revoked certificate needs to wait at least until it appears in a CRL, which can be up to an hour. Expired certificates need to wait even longer: Even if we request the shortest-lived certificates we offer, that’s still six days. To handle this, our program stores a “next” certificate instead of immediately overwriting the current one. We wait at least 24 hours for the revoked certificate to make sure any CRL caches or push-based CRL infrastructure have time to process the revocation. The expired certificate has to wait until it passes its expiration date. After the program decides a certificate is ready, it replaces the current certificate and passes it off to the webserver. Normal ACME tools don’t support this because they can usually start using a certificate as soon as it’s obtained.

And finally, we need a webserver to host the certificates. We’re using Go, which has a great built-in TLS and HTTP serving stack we can use. The Go TLS server takes a GetCertificate callback function that decides what certificate to use for each new connection. We have all our certificates in-memory and select the right one to serve based on the request’s SNI. This function is also where we hook up Lego to serve the challenge certificates required for TLS-ALPN-01. Because we prioritize serving the correct certificate over uptime, we refuse to handle a connection if the corresponding certificate is expired (unless it should be expired!).

Visiting the sites

If you visit one of our revoked sites, you might not get an error message. Revocation checking in browsers varies pretty widely, and has historically not worked great. Today’s state-of-the-art is Firefox’s CRLite, which is efficient and reliable. Ubuntu is deploying upki, a Rustls project based on CRLite. We hope other browsers and operating systems follow suit. The upki project is a great example of a project making use of these revoked test certificates, too.

The actual content of the website isn’t terribly important: We just have a little HTML page explaining what the site is. But since this website is meant for testing clients, there’s more than just browsers connecting. In particular, it’s pretty routine that I try connecting with curl or some other terminal http client, and getting a bunch of HTML spewed to your terminal isn’t very nice.

As a small Easter egg, we added a plain text version of the website with an ASCII art version of our logo that we serve if your HTTP client doesn’t include text/html in its Accept HTTP header. You can pass a ?txt or ?html URL parameter to specifically request one or the other version of the content, if you just want to see the ASCII art.

Let’s Encrypt has four root certificates right now. Each of them have test sites linked both here and from our documentation.

The code

As with a lot of Let’s Encrypt, the code for this project is open-source. You can find it at https://github.com/letsencrypt/test-certs-site/. Other Certificate Authorities who need to run similar test certificate sites are welcome to use it. If you need any features that would make using our test certs site easier for your TLS/HTTPS client testing, please feel free to create an issue on that repository.

[syndicated profile] smbc_comics_feed

Posted by Zach Weinersmith



Click here to go see the bonus panel!

Hovertext:
He got cursed with it after that time Jesus yelled at a fig tree.


Today's News:
[syndicated profile] nwhyte_atom_feed

Posted by fromtheheartofeurope

Second paragraph of third chapter:

Fairly high on the list, however, was: Why breakfast?

Third in Marske’s The Last Binding trilogy. I was not totally convinced by the first of this series, but liked the second much more. I’m afraid that the third lost me not quite half way through, with the protagonists of the previous two books bouncing off each other and around various Victorian (or was it Edwardian?) stately homes. I wasn’t interested enough in the characters or convinced enough by the details of the settings. So I put it down. You can get A Power Unbound here.

This was my top unread book acquired in 2024 (as part of the Hugo packet). Next on that pile is Red Rabbit, by Alex Grecian.

[syndicated profile] scalziwhatever_feed

Posted by John Scalzi

For a time there Smudge was our only boy cat and that meant that he wasn’t able to indulge in one of his favorite pastimes, which was tusslin’. He’d tussle with Zeus, our other male tuxedo (just as Zeus would tussle with Lopsided Cat, our previous male cat), but when Zeus passed on he no longer had a tusslin’ partner. Sugar and Spice were simply Not Having It, as far as tussles went. Smudge would tussle a bit with Charlie, but Charlie is a dog and roughly eight times the mass. It was an asymmetrical sort of tussle, and those are not as fun.

The good news for Smudge is now Saja is here, and Saja loves him a tussle or two. Or three! Or five! We will frequently find the two of them smacking each other about for fun and exercise. The two seem genuinely happy to wrestle on the carpet or otherwise pounce on the other for a couple of minutes. Sugar and Spice are still having none of it from either of them, so this is the best solution for both. And as an observer and appreciator of brief moments of domestic chaos, it’s nice to have the occasional tussle back in the house. Here’s hoping both of them have a long and happy time to tussle together.

— JS

[syndicated profile] cfallin_feed

Posted by

Today, I'll be writing about the aegraph, or acyclic egraph, the data structure at the heart of Cranelift's mid-end optimizer. I introduced this approach in 2022 and, after a somewhat circuitous path involving one full rewrite, a number of interesting realizations and "patches" to the initial idea, various discussions with the wider e-graph community (including a talk (slides) at the EGRAPHS workshop at PLDI 2023 and a recent talk and discussions at the e-graphs Dagstuhl seminar), and a whole bunch of contributed rewrite rules over the past three years, it is time that I describe the why (why an e-graph? what benefits does it bring?), the how (how did we escape the pitfalls of full equality saturation? how did we make this efficient enough to productionize in Cranelift?), and the how much (does it help? how can we evaluate it against alternatives?).

For those who are already familiar with Cranelift's mid-end and its aegraph, note that I'm taking a slightly different approach in this post. I've come to the viewpoint that the "sea-of-nodes" aspect of our aegraph, and the translation passes we've designed to translate into and out of it (with optimizations fused in along the way), are actually more fundamental than the "multi-representation" part of the aegraph, or in other words, the "equivalence class" part itself. I'm choosing to introduce the ideas from sea-of-nodes-first in this post, so we will see a "trivial eclass of one enode" version of the aegraph first (no union nodes), then motivate unions later. In actuality, when I was experimenting then building this functionality in Cranelift in 2022, the desire to integrate e-graphs came first, and aegraphs were created to make them practical; the pedagogy and design taxonomy have only become clear to me over time. With that, let's jump in!

Initial context: Fixpoint Loops and the Pass-Ordering Problem

Around May of 2022, I had introduced a simple alias analysis and related optimizations (removing redundant loads, and doing store-to-load forwarding). It worked fine on all of the expected test cases, and we saw real speedup on a few benchmarks (e.g. 5% on meshoptimizer here) but led to a new question as well: how should we integrate this pass with our other optimization passes, which at the time included GVN (global value numbering), LICM (loop-invariant code motion), constant propagation and some algebraic rewrites?

To see why this is an interesting question, consider how GVN, which canonicalizes values, and redundant load elimination interact, on the following IR snippet:

v2 = load.i64 v0+8
v3 = iadd v2, v1   ;; e.g., array indexing
v4 = load.i8 v3

;; ... (no stores or other side effects here) ...

v10 = load.i64 v0+8
v11 = iadd v10, v1
v12 = load.i8 v11

Redundant load elimination (RLE) will be able to see that the load defining v10 can be removed, and v10 can be made an alias of v2, in a single pass. In a perfect world, we should then be able to see that v11 becomes the same as v3 by means of GVN's canonicalization, and subsequently, v12 becomes an alias of v4. But those last two steps imply a tight cooperation between two different optimization passes: we need to run one full pass of RLE (result: v10 rewritten), then one full pass of GVN (result: v11 rewritten), then one additional full pass of RLE (result: v12 rewritten). One can see that an arbitrarily long chain of such reasoning steps, bouncing through different passes, might require an arbitrarily long sequence of pass invocations to fully simplify. Not good!

This is known as the pass-ordering problem in the study of compilers and is a classical heuristic question with no easy answers as long as the passes remain separate, coarse-grained algorithms (i.e., not interwoven). To permit some interesting cases to work in the initial Cranelift integration of alias analysis-based rewrites, I made a somewhat ad-hoc choice to invoke GVN once after the alias-analysis rewrite pass.

But this is clearly arbitrary, wastes compilation effort in the common case, and we should be able to do better. In general, the solution should reason about all passes' possible rewrites in a unified framework, and interleave them in a fine-grained way: so, for example, if we can apply RLE then GVN five times in a row just for one localized expression, we should be able to do that, without running each of these passes on the whole function body. In other words, we want a "single fixpoint loop" that iterates until optimization is done at a fine granularity.

Three Building Blocks: Rewrites, Code Motion, and Canonicalization

Let's review the optimizations we had at this point:

  • GVN (global value numbering), which is a canonicalization operation: within a given scope where a value is defined (for SSA IRs, the subtree of the dominance tree below a given definition), any identical computations of that value should be canonicalized to the original one.

  • LICM (loop-invariant code motion), which is a code-motion operation: computations that occur within a loop, but whose value is guaranteed to be the same on each iteration, should be moved out. Loop invariance can be defined recursively: values already outside the loop, or pure operators inside the loop whose arguments are all loop-invariant. The transform doesn't change any operators, it only moves where they occur.

  • Constant propagation (cprop) and algebraic rewrites: these are transforms like rewriting 1 + 2 to 3 (cprop) or x + 0 to x (algebraic). They can all be expressed as substitutions for expressions that match a given pattern.

  • Redundant load elimination and store-to-load forwarding: these both replace load operators with the SSA value that operator is known to load.

  • And one that we wanted to implement: rematerialization, which reduces register pressure for values that are easier to recompute on demand (e.g., integer constants) by re-defining them with a new computation. This can be seen as a kind of code motion as well.

As a start to thinking about frameworks, we can categorize the above into code motion, canonicalization, and rewrites. Code motion is what it sounds like: it involves moving where a computation occurs, but not changing it otherwise. Canonicalization is the unifying of more than one instance of a computation into one ("canonical") instance. And rewrites are any optimization that replaces one expression with another that should compute the same value. Said more intuitively (and colloquially), these three categories attempt to cover the whole space of possibilities for "simple" optimizations: one can move code, merge identical code, or replace code with equivalent code. (The notable missing possibility here is the ability to change control flow and/or make use of control-flow-related reasoning; more on that in a later section.) Thus, if we can build a framework that handles these kinds of transforms, we should have a good infrastructure for the next steps in Cranelift's evolution.

IR Design, Sea-of-Nodes, and Intermediate Points

From first principles, one might ask: how should a unifying framework for these concerns look? Code motion and canonicalization together imply that perhaps computations (operator nodes) should not have a "location" in the program, whenever that can be avoided. In other words, perhaps we should find a way to represent add v1, v2 in our IR without putting it somewhere concrete in the control flow. Then all instances of that same computation would be merged (because duplicates would differ only by their location, which we removed), and code motion is... inapplicable, because code does not have a location?

Well, not quite: the idea is that one starts with a conventional IR (with control flow), and ends with it too, but in the middle one can eliminate locations where possible. So in the transition to this representation, we erase locations, and canonicalize; and in the transition from this representation, we re-assign locations, and code-motion can be a side-effect of how we do that.

What we just described above is called a sea-of-nodes IR. A sea-of-nodes IR is one that dispenses with a classical "sequential order" for all instructions or operators in the program, and instead builds a graph (the "sea") of operators (the "nodes") with edges to denote the actual dependencies, either for dataflow or control flow.

In the purest form of this design, one can represent every IR transform as a graph rewrite, because a graph is all there is. For example, LICM, a kind of code motion that hoists a computation out of a loop, is a purely algebraic rewrite on the subgraph representing the loop body. This is because the loop itself is a kind of node in the sea of nodes, with control-flow edges like any other edge; code motion is not a "special" action outside the scope of the expression language (nodes and their operands).

While that kind of flexibility is tempting, it comes with a significant complexity tax as well: it means that reasoning through and implementing classical compiler analyses and transforms is more difficult, at least for existing compiler engineers with their experience, because the IR is so different from the classical data structure (CFG of basic blocks). The V8 team wrote about this difficulty recently as support for their decision to migrate away from a pure Sea-of-Nodes representation.

However, we might achieve some progress toward our goal -- providing a general framework for rewrites, code motion and canonicalization -- if we take inspiration from sea-of-nodes' handling of pure (side-effect-free) operators, and the way that they can "float" in the sea, unmoored by any anchor other than actual inputs and outputs (dataflow edges). Stated succinctly: what if we kept the CFG for the side-effectful instructions (call it the "side-effect skeleton") and used a sea-of-nodes for the rest?

Figure: sea-of-nodes-with-CFG

This would allow for us to unify code motion, canonicalization and rewrites, as described above: canonicalization works on pure operators, because we remove distinctions based on location; code-motion can occur when we put pure operators back in the CFG; and rewrites can occur on pure operators. In fact rewrites are now both (i) simpler to reason about, because we don't have to place expression nodes at locations in an IR, only create them "floating in the air", and (ii) more efficient, because they occur once on a canonicalized instance of an expression, rather than all instances separately.

We'll call this representation a "sea-of-nodes with CFG".

Implementing Sea-of-Nodes-with-CFG

Now, to practical implementation: architecting the entire compiler around sea-of-nodes for pure operators might make sense from first principles, but as a modification of the existing Cranelift compiler pipeline, we would not want to (or be able to) make such a radical change in one step. Rather, I wanted to build this as a replacement for the mid-end, taking CLIF (our conventional CFG-based SSA IR) as input and producing CLIF as output. So we need a three-stage optimizer:

  1. Lift all pure operators out of the CFG, leaving behind the skeleton. Put these operators into the "sea" of pure computation nodes, deduplicating (hash-consing) as we go.

  2. Perform rewrites on these operators, replacing some values with others according to whatever rules we have that preserve value equivalence.

  3. Convert this sea-of-pure-nodes back to sequential IR by scheduling nodes into the CFG. We'll call this process "elaboration" of the computations.

This is in fact how the heart of Cranelift's mid-end now works; we'll go through each part above in turn.

Into Sea-of-Nodes-with-CFG: Canonicalization

Let's talk about how we get into the sea-of-nodes representation first. The most straightforward answer, of course, would be to simply "remove the nodes from the CFG" and let them free-float, referenced by their uses that remain in the skeleton -- and that's it. But that gives up on the obvious opportunity offered by the fact that these operators are pure (have no side-effects, or implicit dependencies on the rest of the world): an operator op v1, v2 always produces the same value given the same inputs, and two separate instances of this node have no distinguishing features or other properties that should lead to different results. Hence, we should canonicalize, or hash-cons, nodes.

Hash-consing is a standard technique in systems that have value- or operator-nodes: the idea is to keep a lookup table indexed by the contents of each value or operator, perform lookups in this table when creating a new node, and reuse existing nodes when a match occurs.

What is the equivalence class by which we deduplicate? (In other words, more concretely, how do we define Eq and Hash on sea-of-nodes values?) We adopt a very simple answer (and deal with subtleties later, as is often the case!): the (shallow) content of a given node is its identity. In other words, if we have iadd v1, v2, then that is "equal to" (deduplicates with) any other such operator.

Now, this shallow notion of equality may not seem like enough to canonicalize all instances of the same expression tree. Consider if we had

v0 = ...
v1 = ...
v2 = iadd v0, v1
v3 = iconst 42
v4 = imul v2, v3

v5 = iadd v0, v1
v6 = iconst 42
v7 = imul v5, v6

Clearly any reasonable canonicalization algorithm should consider v4 and v7 to be the same, and condense uses of them into uses of one canonical node. But the nodes are not shallowly equal. How do we get from here to there?

One possible answer is induction: we could canonicalize a node only after all of its operands have been canonicalized (and rewritten), so we know that if subtrees are identical, we will have identical value numbers. Thus, inductively, all values would be canonicalized deeply.

This requires processing definitions of a node before its uses, however. Fortunately, the SSA CFG from which we are constructing the sea-of-nodes-with-CFG provides us this property already if we traverse it in a particular order: we need to visit blocks in the control-flow graph in some preorder of the dominance tree (domtree), which we usually have available already.

So we have an algorithm something like the following pseudo-code to canonicalize the SSA CFG into a sea-of-nodes-with-CFG:

def canonicalize(basic_block):
  for inst in basic_block:
    if is_pure(inst):                 # only dedup and move to sea-of-nodes for "pure" insts;
                                      # leave the "skeleton" in place
      basic_block.remove(inst)
      inst.rename_values(rename_map)  # rewrite uses according to a value->value map
      if inst in hashcons_map:        # equality defined by shallow content
        rename_map[inst.value] = hashcons_map[inst]
      else:
        nodes.push(inst)              # add to the sea-of-nodes
        hashcons_map[inst] = inst.value
    else:
      # we still need to rename the CFG skeleton's uses to refer to sea-of-nodes
      inst.rename_values(rename_map)

  # recursive domtree-preorder traversal.
  for child in domtree.children(basic_block):
    canonicalize(child)

This will handle not only the above example, where we have "deep equality" (because we will canonicalize and rename e.g. v5 into v2 before visiting v5's use), but also more complex examples with the redundancies spread across basic blocks.

Finally: how does the "-with-CFG" aspect of all of this work? So far, we have very much glossed over any values that are defined in the CFG skeleton, other than to imply above that they are never renamed (because we never take the is_pure branch). But is this OK?

Yes, in a sense, by construction: we have defined all impure values to have their own "identity", distinct from any other such value, even if shallowly equal at a syntactic level. This aligns with the notion that impure computations have implicit inputs: for example, load v0 appearing twice in the program may produce different values at those two different times, so we cannot deduplicate it. This can be relaxed if we have a dedicated analysis that can reason about such implicit dependencies, and in fact for loads we do have one (alias analysis, feeding into redundant-load elimination and store-to-load forwarding). But in general, we cannot do anything with these "roots". Rather, they stay in the skeleton, feed values into the sea of nodes, and consume values back out of that sea of nodes.

Out of Sea-of-Nodes-with-CFG: Scoped Elaboration

Given a sea-of-nodes + skeleton representation of a program, how do we go back to a conventional CFG, with fully linearized operators (i.e., each of which has a concrete program-point where it is computed), to feed to the compiler backend and lower to machine code?

The basic task is to decide a location at which to put each operator. Since nodes in the sea-of-nodes are "rooted" (referenced and ultimately computed/used) by side-effectful operators in the CFG skeleton, the first idea one might have is to copy pure nodes back into the CFG where they are referenced. One could do this recursively: if e.g. we have a side-effecting instruction store v1, v2, we can place the (pure operator) definitions of v1 and v2 just before this instruction; if those definitions require other values, likewise compute them first. We could call this "elaboration".

Let's consider the single-basic-block case first and then define something like the following pseudocode:

def demand_based_elaboration(bb):
  for inst in bb:
    elaborate_inst(inst, bb, before=inst)

def elaborate_inst(inst, bb, before):
  for value in inst.args:
    inst.rewrite_arg(value, elaborate_value(value, bb, before=inst))
  if is_pure(inst):
    bb.insert_before(before, inst)
  return inst.def

def elaborate_value(value, bb, before):
  if defined_by_inst(value):   # some values are blockparam roots, not inst defs
    return elaborate_inst(value.inst, bb, before)
  else:
    return value

This would certainly work, but is far too simple: it duplicates computation every time a value is used, and no value (other than blockparam roots) is ever used more than once. This will almost certainly result in extreme blowup in program size!

So if we use a value multiple times, it seems that we should compute it once, some place in the program before any of the uses. For example, perhaps we could augment the above algorithm with a map that records the resulting value number the first time we elaborate a node, and reuses it (i.e., memoizes the elaboration):

# ...

def elaborate_value(value, bb, before):
  if value in elaborated:
    return already_elaborated[value]
  else if defined_by_inst(inst):
    result = elaborate_inst(value.inst, bb, before)
    elaborated[value] = result
    return result
  else:
    return value

This modified algorithm will handle the case of a single block with reuse efficiently, computing a value the first time it is used ("on demand") as expected.

Now let's consider multiple basic blocks. One might be tempted to wrap the above with a traversal, as we did for the translation into sea-of-nodes:

def elaborate_domtree(bb):
  demand_based_elaboration(bb)
  for child in domtree.children(bb):
    elaborate_domtree(child)

def elaborate(func):
  elaborate_domtree(func.entry)

But this, too, has an issue. Consider a program that began as a CFG with many paths, two of which compute the same value:

Figure: CFG with some redundancy between code paths

If we define some traversal over all basic blocks to perform an elaboration as above, with a single map elaborated, we will

  • Elaborate a computation of v2 in bb2 and use it there;
  • Use it in bb3 as well in place of v3, since it has already been computed and is thus memoized;
  • And thus generate invalid SSA, where a value is used on a path where it is never computed!

Perhaps we could hoist the computation to a "common ancestor" of all of its uses instead. Here that would be bb1. But that creates yet another problem: if control flows from bb1 to bb4, then we will have computed the value and never used it -- in supposedly optimized code! This is sometimes called a "partial redundancy": a computation that is sometimes unused, depending on control flow. We would like to avoid this if possible.

It turns out that this problem exactly corresponds to common subexpression elimination (CSE), which aims to find one place to compute a value possibly used multiple times. The usual approach in SSA code, global value numbering (GVN), solves the problem by reasoning about scopes, where a "scope" is the region in which a value has already been computed. The intuition is that at any given use, we can cast a "shadow" downward and remove redundant uses but only in that shadow. So in our example program, if bb1 computed v2 then we could reuse it in bb2 and bb3; but because it occurs independently in two subtrees with no common ancestor, we do nothing; we duplicate it (re-elaborate it).

SSA "scopes" -- regions in which a value can be used -- are defined by the dominance relation, and so we can work with a domtree traversal to implement the needed behavior. Concretely, we can do a domtree preorder traversal; we can keep the elaborated map but separate it into scope "overlays", and push a new overlay for each subtree. This formalizes the "shadow" intuition above. We call this scoped elaboration. Pseudo-code follows:

def find_in_scope(value, scope):
  if value in scope.map:
    return scope.map[value]
  elif scope.parent:
    return find_in_scope(value, scope.parent)
  else:
    return None

def elaborate_value(value, bb, before, scope):
  if find_in_scope(value, scope):
    # ...
  # ...
  
def elaborate_domtree(bb, scope):
  demand_based_elaboration(bb, scope)
  for child in domtree.children(bb):
    subscope = { map = {}, parent = scope }
    elaborate_domtree(child, subscope)
    
def elaborate(func):
  root_scope = { map = {}, parent = None }
  elaborate_domtree(func.entry, root_scope)

The real implementation of our scoped hashmap takes advantage of the fact that keys will not overlap between overlay layers (because once defined, a value will not be re-defined in a lower layer), and this enables us to have true O(1) rather than O(depth) lookup using some tricks with a layer number and generation-per-layer (see implementation for details!). Nevertheless, the semantics are the same as above.

As we foreshadowed above, just as the problem is closely related to CSE and GVN, scoped elaboration is as well. In fact, the approach of tracking a definition-within-scope for scopes that correspond to subtrees in the domtree, given a preorder traversal on the domtree, is exactly how Cranelift's old implementation works as well. We even borrowed the scoped hashmap implementation!

A few more observations are in order. First, it's fairly interesting that we sometimes re-elaborate a node into multiple dom subtrees; why is this? Does this introduce inefficiency (e.g. in code size) or is it the best we can do?

The duplication is, in my opinion, best seen as a dual of the canonicalization. The original code may have multiple copies of a pure computation in multiple paths, with no common ancestor that computes that value. When translating to sea-of-nodes, we will canonicalize that computation, so we can optimize it once. But then when returning to the original linearized IR, we may need to restore the original duplication if there truly was no (non-redundancy-producing) optimization opportunity. Additionally, and very importantly: we should never elaborate a value in more than one place unless it also appeared in more than once in the original program. So we should not grow the program size beyond the original.

Another interesting observation is that by driving elaboration by demand (from the roots in the side-effecting CFG skeleton), we do dead-code elimination (DCE) of the pure operations for free. Their existence in the sea of nodes may cost us some compile time if we spend effort to optimize them (only to throw them away later); but anything that becomes dead because of rewrites in sea-of-nodes will then naturally disappear from the final result.

A third observation is that elaboration gives us a central location to control when and where code is placed in the final program. In other words, there is room for us to add heuristics beyond the simplest version of the algorithm described above. For example: we stated that we did not want to introduce any partial redundancies. But for correctness, we don't need to adhere to this: our only real restriction is that a pure computation cannot happen before its arguments are computed (i.e., we have to obey dataflow dependencies). So, for example, if we have the loop nest (structure of loops in the program) available, if a pure computation within a loop does not use any values that are computed within that loop, we know it is loop-invariant and we may choose to elaborate it before the loop begins (into the "preheader"), in a transform known as loop-invariant code motion (LICM). This is redundant if the loop executes zero iterations, but most loops execute at least once; and performing a loop-invariant computation only once can be a huge efficiency improvement.

In the other direction -- pushing computation downward rather than upward -- we could choose to implement rematerialization by strategically forgetting a value in the already-elaborated scope and recomputing it at a new use. Why would we do this? Perhaps it is cheaper to recompute than to thread the original value through the program. For example, constant values are very cheap to "compute" (typically 1 or 2 instructions) but burning a machine register to keep a constant across a long function can be expensive.

There is a lot of room for heuristic code scheduling within elaboration as well (LICM and rematerialization can be seen as scheduling too, but here I mean the order that operations are linearized within the block they are otherwise elaborated into). For a modern out-of-order CPU, this may not matter too much to the hardware -- but it may matter to the register allocator, because reordering instructions changes the "interference graph", or the way that different live register values compete for finite resources (hardware registers). E.g., pushing an instruction that uses many values for the last time "earlier" (to eliminate the need to store those values) is great; but this minimization is not always straightforward. In fact, ordering instructions that define and use values to minimize the coloring count for the resulting live-range interference graph is an NP-complete problem. So it goes, too often, in compiler engineering!

Despite the complexities that may arise in combining many heuristics, these three dimensions -- LICM, rematerialization, and code scheduling for register pressure -- are an interesting high-dimensional cost optimization problem and one that we still haven't fully solved (see e.g. #6159, #6260 and #8959).

Optimizing Pure Expression Nodes: Rewrite Framework

We've covered the transitions into and out of the sea-of-nodes-with-CFG program representation. We've seen how merely this translation gives us GVN (deduplication), DCE, LICM, and rematerialization "for free" (not really free, but falling out as a natural consequence of the algorithms). But we still haven't covered one of the most classical sets of optimizations: algebraic (and other) rewrites from one expression to another equivalent one (e.g, x+0 to x). How can we do this on the sea-of-nodes?

In principle, the answer is as "simple" as: build the logic that pattern-matches the "left-hand side" of a rewrite (the part that we have a "better" equivalent expression for), and then replaces it with the "right-hand side". That is, in x + 0 -> x, the left-hand side is x + 0 and the right-hand side is x. Such a framework is highly amenable to a domain-specific language to express these rewrites: ideally one doesn't want to write code that manually iterates through nodes to find these patterns. Fortunately for us, in the Cranelift project we have the ISLE (instruction-selection and lowering-expressions) DSL (RFC, language reference, blog post). I originally designed ISLE in the context of instruction lowering, as the name implies, but I was careful to keep a separation between the core language and its "prelude" binding it to a particular environment. Hence we could adapt it fairly easily to rewrite a graph of Cranelift IR operators as well. The idea is that, as in instruction lowering, for mid-end optimizations we invoke an ISLE constructor (entry point) on a particular node and the ruleset produces a possibly better node.

That gives us the logic for one expression, but there is still an open question how to apply these rewrites: to which nodes, in what order, and how to manage or update any uses of a node when that node is rewritten.

The two general design axes one might consider are:

  • Eager or deferred: do we apply rewrites to a node as soon as it exists, or apply them later (perhaps as some sort of batch-rewrite)?

  • Single-rewrite or fixpoint loop: do we rewrite a node only once, or apply rewrite rules again to the result of a rewrite? Also, if the operand of a node is rewritten, do we (and how do we) rewrite users of that node as well, since more tree-matching patterns may now apply to the new subtree?

It is clear that different choices to these questions could lead to different efficiency-quality tradeoffs: most obviously, applying rewrites in a fixpoint should produce better code at the cost of longer compile time. But also, it seems possible that either eager or deferred rewrite processing could win, depending on the workload and particular rules: batching (hence, deferred until one bulk pass) often leads to efficiency advantages (see the egg paper and discussion below!), but also, deferral may require additional bookkeeping vs. eagerly rewriting before making use of the (soon to be stale) original value.

For the overall design that we have described so far, there turns out to be a fairly clear optimal answer, surprisingly: because we build an acyclic sea-of-nodes, as long as we keep it acyclic during rewrites, we should be able to do a single rewrite pass rather than a fixpoint. And, to make that single pass work, we rewrite eagerly, as soon as we create a node; then use the final rewritten version of that node for any uses of the original value. Because we visit defs before uses and do rewrites immediately at the def, we never need to update (and re-canonicalize!) nodes after creation.

An aside is in order: while it is fairly clear why the sea-of-nodes-with-CFG is initially acyclic -- because SSA permits dataflow cycles only through block-parameters / phi-nodes, and those remain in the CFG, which we don't "look through" when applying rewrites -- it is less clear why rewrites should maintain acyclicity, especially in the face of hashconsing, which may "tie the knot" of a cycle if we're not careful. The answer lies in the previous paragraph: once we create a node, we never update it. That's it! We've now maintained acyclicity, by construction.

Perhaps surprisingly as well, this rewrite process can be fused with the translation pass into the sea-of-nodes itself. So we can amend the above canonicalize to

def canonicalize(basic_block):
  for inst in basic_block:
    if is_pure(inst):
      # ...
      if inst in hashcons_map:
        # ...
      else:
        inst = rewrite(inst)          # NEW
        nodes.push(inst)              # add to the sea-of-nodes
        hashcons_map[inst] = inst.value
    else:
       # ...

i.e., simply add the rewrite rule application at the place we create nodes, and hashcons based on the final version of the instruction.

Now, note that this is not quite complete yet: inst = rewrite(inst) is doing some heavy lifting, and is actually a bit too simplistic, in the sense that this implies that a rewrite rule can only ever rewrite to one instruction on the right hand side. This isn't quite right: for example, one may want a DeMorgan rewrite rule ~(x & y) -> ~x | ~y. The right-hand side includes three operator nodes (instructions): two bitwise-NOTs and the OR that uses them. What if x or y in this pattern also match a subexpression that can be simplified with some logic rule?

There seem to be two general answers: create the original right-hand side nodes un-rewritten and later apply rewrites, or immediately and recursively rewrite. As we observed above, deferral requires additional bookkeeping and re-canonicalization as a node's inputs change, so we choose the recursive approach. So, concretely, given ~((a & b) & (c & d)) and the one rewrite rule above, we would:

  • Encounter the top-level ~, and try to match the rewrite rule's left-hand side. It would match with bindings x = (a & b) and y = (c & d).
  • Apply the right-hand side ~x | ~y bottom-up, building nodes and rewriting them as we go:
    • First, ~x. This creates ~(a & b), which recursively fires the rule, which results in (~a | ~b).
    • Then, ~y. This creates ~(c & d), again recursively firing the rule, which results in (~c | ~d).
    • We then create the top-level node on the right-hand side, resulting in (~a | ~b) | (~c | ~d).

One needs to limit the recursion if there is any concern that rule chain depths may not be statically bounded or easily analyzable, but otherwise this yields the correct answer in a single pass without the need to track users of a node to later rewrite and recanonicalize it.

And that's the whole pipeline: we now have a way to optimize code by translating to sea-of-nodes-with-CFG, applying rewrites as we go, then translating back to classical SSA CFG. In the process we've achieved all the goals we set out with: GVN, LICM, DCE, rematerialization, and algebraic rewrites.

E-graphs: Representing Many Possible Rewrites

So far, we've described a system that has zero or one deterministic rewrite for any given node; this is analogous to a classical compiler pipeline that destructively updates instructions/operators. This is great for rewrite rules like x+0 -> x: the right-hand side is unambiguously better if it is "smaller" (rewrites a whole expression into only one of its parts). This is also fine when instructions have clear and very distinct costs, such as integer divide (typically tens of cycles or more even on modern CPUs) by a constant converted into magic wrapping multiplies.

But what about cases where the benefit of a rewrite is less clear, or depends on context, or depends on how it may or may not be able to compose with or enable other rewrites in a given program?

For example, consider the classical example from the 2021 paper on egg, an e-graph framework: if we have the expression (x * 2) / 2 in our program, we would expect that to simplify to x1. To implement this simplification, we might have a general rewrite rule (x * k) / k -> x. But we might also, separately, have a rewrite rule that (x * 2^k) -> (x << k), i.e., convert a multiplication into a left-shift operation. If we performed this latter rewrite eagerly, the former rewrite rule might never match.

(Now, you might complain that we could also convert the divide into a right-shift, then we have another rewrite rule that simplifies (x << k) >> k -> x. In this particular example, that might be reasonable. But (i) that required careful thinking about canonical forms, where multiplies/divides by powers-of-2 are always canonicalized down to shifts, and (ii) this same fortunate behavior might not exist for all rulesets.)

In general, we also have a question at the rule-application level: if multiple rules apply, which do we take? In the above example, we would have had to have some prioritization scheme to (say) apply strength-reduction rules to convert to shifts before we examine divide-of-multiply. That's an extra layer of heuristic engineering that must be considered when designing the optimizer.

Onto the scene, then, comes a new data structure: the e-graph, or equivalence graph, which is a kind of sea-of-nodes program/expression representation that can represent many different equivalent forms of a program at once. The key idea is that, rather than have a single expression node as a referent for any value, we have an e-class (equivalence class) that contains many e-nodes, and we can pick any of these e-nodes to compute the value.

The idea is a sort of principled approach to the optimization problem: let's model the state space explicitly, and then pick the best result objectively. Typically one uses the result of an e-graph by "extracting" one possible representation of the program according to a cost metric. (More on this below, but a simple cost metric could be a static number per operator kind, plus cost of inputs.)

The magic of e-graphs is how they can compress a very large combinatorial space of equivalent programs into a small data structure. A detailed exploration of how this works is beyond the scope of this blog post (please read the egg paper: it's very good!) but a very short intuitive summary might be something like:

  • Ensuring that all value uses point to an e-class rather than a particular node will propagate knowledge of equivalences to maximally many places. That is, if we know that op1 v1, v2 is equivalent to op2 v3, v4, all users of the op1 v1, v2 expression should automatically get the knowledge propagated that they can use any form. This knowledge propagation is the essence of "equality saturation" that e-graphs enable.

  • A strong regime of canonicalization and "re-interning" (re-hashconsing), which the egg paper calls "rebuilding", ensures that such information is maximally propagated. Basically, when we discover that the op1 and op2 expressions above are equivalent, we re-process all users of both op1 and op2, looking for more follow-on consequences. Merging those two might in turn cause other expressions to be equivalent or other rewrite rules to fire.

Practical Efficiency of Classical E-graphs

The two problems that arise with a "classical e-graph" (by which I include the 2021 egg paper's batched-rebuilding formulation) are blowup -- that is, too many rewrite rules apply and the e-graph becomes too large -- and data-structure inefficiency.

The blowup problem is easier to understand: if we allow for representing many different forms of the program, maybe we will represent too many, and run out of memory and processing time. It is often hard to control how rules will compose and lead to blowup, as well: each rewrite rule may seem reasonable in isolation, but the transitive closure of all possible programs under a well-developed set of equivalences can be massive. So practical applications of e-graphs usually need some kind of meta/strategy driver layer that uses "fuel" to bound effort, and/or selectively applies rewrites where they are likely to lead to better outcomes. Even then, this operating regime often has compile-times measured in seconds or worse. This may be appropriate for certain kinds of optimization problems where compilation happens once or rarely and the quality of the outcome is extremely important (e.g., hardware design), but not for a fast compiler like Cranelift.

We can protect against such outcomes with careful heuristics, though, and the possibility of allowing for objective choice of the best possible expression is still very tempting. So in my initial experiments, I applied the egg crate to the problem and eventually, with custom tweaks, managed to get e-graph roundtripping to 23% overhead -- with no rewrites applied. That's not bad at first glance but it proposes to replace an optimization pipeline that itself takes only 10% of compile-time, and we haven't yet added the rewrites to the 23%. (And the 23% came after a good amount of data-structure engineering to reduce storage; the initial overhead was over 2x.)

In profiling the optimizer's execution, the overheads were occurring more or less in building the e-graph itself (that is, cache misses throughout the code transcribing IR to the e-graph). And what does the e-graph contain? Per e-class, it contains a "parent pointer" list: we need to track users of every e-class so that we can re-canonicalize them during the "rebuild" step when e-classes are merged (a new equivalence is discovered). And, even more fundamentally, it stores e-nodes separately from e-classes, which is an essential element of the idea but means that we have (at least) two different entities for each value, even when most e-classes have only one e-node.

Is there any way to simplify the data structures so that we don't have to store so many different bits for one value?

Insight #1: Implicit E-graph in the SSA IR

The first major insight that enabled efficient implementation of an e-graph in Cranelift was that we could redefine the existing IR into an implicit e-graph, without copying over the whole function body into an e-graph and back, thus avoiding the compile-time penalty of this data movement. (Data movement can be very expensive when the main loops of a program are otherwise fairly optimized! It is best to keep and operate on data in-place whenever possible.)

We start with a sea-of-nodes-with-CFG, where we have an IR with SSA values not placed in basic blocks. We can already build this "in-place" in Cranelift's IR, CLIF, by removing existing SSA definitions from the CFG but keeping their data in the data-flow graph (DFG) data structures.

Then, to allow for multi-representation in an e-graph, the idea is to discard the separation between e-classes and e-nodes, and instead define a new kind of IR node that is a union node. Rather than two index spaces, for e-nodes and e-classes, we have only one index space, the SSA value space. An SSA value is either an ordinary operator result or a block parameter (as before), or a union of two other SSA values. Any arbitrary e-class can then be represented via a binary tree of union nodes. We don't need to change anything about operator arguments to make use of this representation: operators already refer to value numbers, and an e-class of multiple e-nodes (defined by the "top" union node in its union tree) already has a value number.

The coolest thing about this representation is: once we have a sea-of-nodes, it is already implicitly an e-graph, with "trivial" (one-member) e-classes for each e-node. Thus, the lift from sea-of-nodes to e-graph is a no-op -- the best (and cheapest) kind of compile-time pass. We only pay for multi-representation when we use that capability, creating union nodes as needed.

Insight #2: Acyclicity with Eager Rewrites

The other aspect of the classical e-graph data structure's cost has to do with its need to rebuild, and in order to do so, to track all uses of an e-class (its "parents" in egg's terminology). Cranelift does not keep bidirectional use-def links, and the binary tree of union nodes would make this even more complex still to track.

In trying to address this cost, I started with a somewhat radical question: what would happen if we never rebuilt (to propagate equalities)? How much "optimization goodness" would that give up?

If one (i) builds an e-graph then (ii) applies rewrite rules to find better versions of nodes, adding to e-classes, then the answer is that this would hardly work at all: this would mean that all users of a value would see only its initial form and never its rewrites. The rewritten forms would float in the sea-of-nodes, and union-nodes joining them to the original forms would exist, but no users would actually refer to those union nodes.

Instead, what is needed is to apply rewrites eagerly. When we create a new node in the sea-of-nodes, we apply all rewrites immediately, then join those rewrites with the original form with union nodes. The "top" of that union tree is then the value number used as the "optimized form" of that original value, referenced by all subsequent uses.

The union-node representation plays a key part of this story: it acts as an immutable data structure in a sense, where we always append new knowledge and union it with existing values, and refer to that "newer version" of an e-class; but we never go back and update existing references.

This has a very nice implication for the graph structure of the sea of nodes as well: it preserves acyclicity! Classical e-graphs, in their rebuild step, can create cycles even when the input is acyclic because they can condense nodes arbitrarily. But when we eagerly rewrite, then freeze, we can never "tie the knot" and create a cycle.

This acyclicity is important because it permits a single pass for the rewrites. In fact, taking our sea-of-nodes build algorithm above as a baseline, we can add eager rewriting as a very small change: when we apply rewrites, we build a "union-node spine" to join all rewritten forms, rather than destructively take only the rewritten form.

def canonicalize_and_rewrite(basic_block):
  for inst in basic_block:
    if is_pure(inst):
      # ...
      if inst in hashcons_map:
        # ...
      else:
        optimized = rewrites(inst)                     # NEW
        union = join_with_union_nodes(inst, optimized) # NEW
        optimized_form[inst.def] = union
    else:
       # ...

All of these aspects work together and cannot really be separated:

  • Union nodes allow for a cheap, pay-as-you-go representation for e-classes, without a two-level data structure (nodes and classes) and without parent pointers.
  • Eager rewriting, applied as we build the e-graph (sea of nodes), allows for a single-pass algorithm and ensures all members of the e-class are present before it is "sealed" by union nodes and referenced by uses.
  • Acyclicity, present in the input (because of SSA), is preserved by the append-only, immutable nature of union nodes, and permits eager rewriting to work in a single pass.

Note that here we are glossing over recursive rewrites. Due to space constraints I will only outline the problem and solution briefly: the right-hand side of a rewrite rule application (rewrites above) will produce nodes that themselves may be able to trigger further rewrites. Rather than leave this to another iteration of a rewrite loop, as a classical e-graph driver might do, we want to eagerly rewrite this right-hand side as well before establishing any uses of it. So we recursively re-invoke rewrites; and this occurs within the right-hand side of rules as pieces of the final expression are created, as well. This recursion is tightly bounded (in levels and in total rewrite invocations per top-level loop iteration) to prevent blowup.

Finally, we are also glossing over details of how we apply our pattern-matching/rewrite DSL, ISLE, to the rewrite problem when multiple rewrites are now permitted. In brief, we extended the language to permit "multi-extractors" and "multi-constructors": rather than matching only one rule, and disambiguating by priority, we take all applicable rules. The RFC has more details.

The Extraction Problem

So we now have a way to represent multiple expressions as alternatives to compute the same value. How do we compile this program? It surely wouldn't make sense to compile all of these expressions: they produce the same bits, so we only need one. Which one do we pick?

This is the extraction problem, and it is both easy to state and deceptively hard (in fact, NP-hard): choose the easiest (cheapest) expression to compute any given value.

Why is this hard? First, let's construct the case where it's easy. Let's say that we have one root expression (say, returned from a function) with all pure operators. This forms a tree of choices: each eclass lets us choose one enode to compute it, and that enode has arguments that themselves refer to eclasses with choices.

Given this tree of choices, with every choice independent, we can pick the best choice for each subtree, and compute the cost of any given expression node as best-cost-of-args plus that own node's cost to compute. In more formal algorithmic terms, that is optimal substructure.

Unfortunately, as soon as we permit references to shared nodes (a DAG rather than a tree), this nice structure evaporates. To see why, consider: we could have two eclasses we wish to compute

v0 = union v10, v11
v1 = union v10, v12

with computations (not shown) v10 that costs 10 units to compute, and v11 and v12 that each cost 7 units to compute. The optimal choice at each subproblem is to choose the cheaper computation (v11 or v12), but the program would actually be more globally optimal if we computed only v10 (cost of 10 total). A solver that tries to recognize this would either process each root (v0 and v1) one at a time and "backtrack" at some point once it sees the additional use, or somehow build a shared representation of the problem, which is no longer deconstructed in a way that permits sub-problem solutions to compose.

In fact, the extraction problem is NP-hard. To see why, I will show a simple linear-time reduction (mapping) from a known NP-hard problem, weighted set-cover, to eclass extraction.

Take each weighted set S with weight w and elements S = { x_1, x_2, ... }. Add an enode (operator with args) N, with self-cost (not including args) w, and no arguments. Then for each element x_n in the universe (the union of all sets' elements), define an eclass: that is, if we have an x_i, define an eclass C_i. Then for each set-element edge (for each i, j such that x_i ∈ S_j), add an enode to C_i with opaque zero-cost operator SetElt_ij(y) where y is the eclass for x_i.

Performing an optimal (lowest-cost) extraction, with all eclasses C_i taken as roots, will compute the lowest-weight set cover: the choice of enode in each eclass C_i encodes which set we choose to cover element x_i. Thus, because egraph extraction with shared structure can compute the solution to an NP-hard problem (weighted set cover), egraph extraction with shared structure is NP-hard.

OK, but we want a fast compiler. What do we do?

The classical compiler-literature answer to this problem -- seen over and over in a 50-year history -- is "solve a simpler approximation problem". Register allocation, for example, is filled with simplified problem models (linear scan, no live-range splitting, ...) that reduce the decision space and allow for a simpler algorithm.

In our case, we solve the extraction problem with a simplifying choice: we will not try to account for shared substructure and the way that it complicates accounting of cost. In other words, we'll ignore shared substructure, pretending that each use of a subtree counts that subtree's cost anew. For each enode, having computed the cost of each of its arguments, we can compute its own cost easily as the sum of its arguments plus its own computation cost; and for each eclass, we can pick the minimum-cost enode. That's it!

We implement this with a dynamic programming algorithm: we do a toposort of the aegraph (which can always be done, because it's acyclic), then process nodes from leaves upward, accumulating cost and picking minima at each subproblem. This is a single pass and is a relatively fast and straightforward algorithm.

After the Dagstuhl seminar in January, I had an ongoing discussion with collaborators Alexa VanHattum and Nick Fitzgerald about whether we could do better here. Alexa and Nick both prototyped a bunch of interesting alternatives: dynamically updating (shortcutting to zero) costs when subtrees become used ("sunk-cost" accounting), computing costs by doing full top-down traversals rather than bottom-up dynamic programming (and then mixing in memoization somehow), trying to account for sharing by doing DP but tracking the full set of covered leaves, and some other things. This was an interesting exploration but in the end we didn't find anything that looked better in the compile-time / execution-time tradeoff space. We have an issue tracking this and more ideas are always welcome, of course.

Other Aspects

There are two other aspects of our aegraph implementation that I don't have space to go into in this post:

  • There is an interesting problem that arises with respect to the domtree and SSA invariants when different values are merged together with a union node and some of them have wider "scope" than others. For example, via store-to-load forwarding we may know that a load instruction produces a constant 0; so we might have a union node with iconst 0. The load can only happen at its current location, but iconst 0 can be computed anywhere. A user of this eclass should be able to pick either value (said another way: extraction should not be load-bearing for correctness). If the user is within the dominance subtree under the load, then all is fine, but if not, e.g. if some other user of iconst 0 elsewhere in the function errantly happened upon the eclass-neighbor load instruction, we might get an invalid program.

    There are many ways one might be tempted to solve this, but in the end we landed on an "available block" analysis that runs as we build nodes. For every node, we record which block is the "highest" in the domtree that it can be computed: function entry for pure zero-argument nodes, current block for any impure nodes, otherwise the lowest node in the domtree among available blocks for all arguments. (Claim: the available-nodes for all args of a node will form an ancestor path in the domtree; one will always exist that is dominated by all others. This follows from the properties of SSA.) Then when we insert into the hashcons map, we insert at the level that the final union is available.

  • We also have an important optimization that we call subsume. This is an identity operator that wraps a value returned by a rewrite rule. It is not required for correctness, but its semantics are: if any value is marked "subsume", all "subsuming" values erase existing members of the eclass. Usually, only one subsuming rule will match (but this, also, is not necessary for correctness).

    The usual use-case is for rules that have clear "directionality": it is always better to say 2 than (iadd 1 1), so let's go ahead and shrink the eclass so that all further matching, and eventual extraction, is more efficient.

Evaluation

So how does all of this actually work? Do aegraphs benefit Cranelift's strength as a compiler -- its ability to optimize code, its efficiency in doing so quickly, or both?

This is the part where I offer a somewhat surprising conclusion: the tl;dr of this post is that I believe the sea-of-nodes-with-CFG aspect of this mid-end works great, but the aegraph itself -- the ability to represent multiple options for one value -- may not (yet?) be pulling its weight. It doesn't really hurt much either, so maybe it's a reasonable capability to keep around. But in any case, it's an interesting conclusion and we'll dig more into it below.

The main interesting evaluation is a two-dimension comparison of compile time -- that is, how long Cranelift takes to compile code -- on the X-axis, versus execution time -- that is, how long the resulting code takes to execute -- on the Y-axis. This forms a tradeoff space: it may be good to spend a little more time to compile if the resulting code runs faster (or vice-versa), for example. Of course, reducing both is best. One point may be "strictly better" than another if it reduces both -- then there is no tradeoff, because one would always choose the configuration that both compiles faster and produces better code. (One can then find the Pareto frontier of points that form a set in which none is strictly better than another -- these are all "valid configuration points" that one may rationally choose depending on one's goals.)

Below we have a compile-time vs. execution-time plot for a number of configurations of Cranelift, compiling and running the Sightglass benchmark suite:

  • No optimizations enabled;
  • The (on by default) aegraph-based optimization pipeline, as described in this post, with several variants (below);
  • A "classical optimization pipeline" that does not form a sea-of-nodes-with-CFG at all; instead, it applies exactly the same rewrite rules, but in-place, and interleaves with classical GVN and LICM passes;
  • Variants of the aegraphs pipeline and classical pipeline with the whole mid-end repeated 2 or 3 times (to test whether code continues to get better).

Here's the main result:

Figure: compile time vs. execution time

A few conclusions are in order. First, the aegraph pipeline does generate better code than the classical pipeline. This objective result is "mission accomplished" with respect to the aegraph effort's original motivation: we wanted to allow optimization passes to interact more finely and optimize more completely. Note in particular that repeating the classical pipeline multiple times does not get the same result; we could not have obtained the ~2% speedup without building a new optimization framework.

Second, though, there is clearly a Pareto frontier that includes "no optimizations" and "classical pipeline" as well as the aegraph variants: each takes more compilation time than the previous. In other words, moving from a classical compiler pipeline to the design described here, we spend about 7-8% more compile time. Notably, this is not the result that we had when we first built the aegraphs implementation in 2023 and switched over -- at that time, we were more or less at parity. This is likely a result of the growth of the body of rewrite rules over the intervening three years.

To get a better picture of how aegraph's various design choices matter, let's zoom into the area in the red ellipse above, which contains multiple variants of the aegraphs pipeline:

  • "aegraph": Exactly as described in this post, and default Cranelift configuration;
  • "no multivalue (eager pick)": sea-of-nodes-with-CFG, without union nodes; i.e., not actually representing more than one equivalent value in an eclass. Instead, after evaluating rewrite rules, we pick the best option and use that one option (destructively replacing the original);
  • "no rematerialization": testing the effect of this aspect of the elaboration algorithm;
  • "no subsume": testing this efficiency tweak of the rewrite-rule application.

Here's the plot:

Figure: aegraphs variants

One can see that there are some definite tradeoffs, but looking closely at the axis scales, these effects are very very small. In particular, moving from sea-of-nodes-with-CFG to true aegraph (taking all rewritten values, and picking the best in a principled way with cost-based extraction) nets us ~0.1% execution-time improvement, at ~0.005% compile-time cost. That's more-or-less in the noise.

Supporting that conclusion is the statistic that the average eclass size after rewriting is 1.13 enodes: in other words, very few cases with our ruleset and benchmark corpus actually result in more than one option.

Finally, the most interesting question in my view: does the eager aspect of aegraphs -- applying rewrite rules right away, and never going back to "fill in" other equivalences -- matter? In other words, does skipping equality saturation take the egraph goodness out of an egraph(-alike)?

We can measure this, too: I instrumented our implementation to track when a subtree of an eclass is not chosen by extraction, and then any node in that subtree is later actually elaborated (in other words, when we use a suboptimal choice because we could not see an equality in the "wrong" direction). This should only happen if, in theory, our rules rewrite f to g where cost(g) > cost(f), and we don't have a rewrite g to f: then a user of g might never directly get a rewrite of f eagerly, but a later coincidentally-occurring f might rewrite onto g (but we'll never propagate that equality into the original users of g).

It turns out that, in all of our benchmarks, with ~4 million value nodes created overall, this happens two (2) times. Both instances occur in spidermonkey.wasm (a large benchmark that consists of the SpiderMonkey JS engine, compiled to WebAssembly, then run through Wasmtime+Cranelift), and occur due to an ireduce-of-iadd rewrite rule that violates this move-toward-lower-cost principle (explicitly, in the name of simplicity). Overall, we conclude that the eager rewrites are effective as long as the ruleset is designed with optimization (rather than mere exploration of all equivalent expressions) in mind.

Discussion

The most surprising conclusion in all of the data was, for me, that aegraphs (per se) -- multi-value representations -- don't seem to matter. What?! That was the entire point of the project, and (proper) e-graphs have seen great promise in other application areas.

I think the main reason for this is that our workload is somewhat "small" in a combinatorial possibility-space sense: we are (i) compiling workloads that are often optimized already (as Wasm modules) before hitting the Cranelift compilation pipeline, and (ii) applying a set of rewrite rules that, while large and growing (hundreds of rules), explicitly do not include identities like associativity and commutativity, or arbitrary algebraic identities, that do not "simplify" somehow. In other words, if we're generally applying rewrites that look more like simple, obvious "cleanups", we would expect that we don't hold a "superposition" of multiple good expression options very often.

Given that it doesn't cost us that much compile time to keep aegraphs around, though, maybe this is... fine? Having the capability to do principled cost-based extraction is great, versus having to think about whether a rewrite rule should exist. We still do try to be careful not to introduce rules that are never productive, of course.

And, further into the future, one could imagine that workloads with more optimization opportunity could cause more interesting situations to occur within the aegraph, leading to more emergent composition in the rewrites.

Future Directions

There are a bunch of directions we could (and should) take this in the future. In terms of evaluation: finding the "corner of the use-case domain" where aegraphs truly shine is still an open question. More concretely: if we evaluate Cranelift with new and different workloads, and/or pile on more rewrite rules, do we get to a point where the classical benefit of "multi-representation with cost-based extraction" pays off in a conventional compiler? I don't know!

There is also still a lot of room to improve the core algorithms:

  • Better extraction, as mentioned above: something that accounts for shared substructure would be great, as long as we don't have to pay the NP-hard cost for it. Maybe there's a nice approximation algorithm that's better than our current dynamic-programming approach.

  • We'd like to be able to handle more rewrites that alter the CFG skeleton as well. Right now, we have a separate ISLE entry-point that allows for destructive rewriting of skeleton instructions (thanks to my colleague Nick Fitzgerald for building this!). However, maybe we could remove redundant block parameters (phi nodes), for example; and/or maybe we could fold branches; and/or maybe we could apply path-sensitive knowledge to values when used in certain control-flow contexts (x=1 in the dominance subtree under the "true" branch for if x==1, goto ...). My former colleague Jamey Sharp wrote up a few excellent, in-depth issues on these topics in our tracker (#5623, #6109, #6129) and I think there is a lot of potential here.

    (The full version of this is, again, something like RVSDG in the node language seems like the most principled option to express all useful forms of control-flow rewrites; Jamey also has a prototype called optir for this.)

  • It would be interesting to experiment with incorporating our lowering backend rules into the aegraph somehow: they are a rich, fruitful target-specific database of natural "costs" for various operations. For example, on AArch64 we can fold shifts and extends into (some) arithmetic operations for "free"; maybe this alters the extraction choices we make. Or likewise for the various odd corners of addressing modes on each architecture.

    The simple version of this idea is to incorporate lowering rules as rewrites, and make the egraph's node language a union of CLIF and the machine's instruction set. But maybe there's something better we could do instead, allowing multi-extractors to see the aegraph eclasses directly and keeping various VCode sequences. I need to write up more of my ideas on this topic someday. Jamey also has more thoughts on this in #8529.

I'm sure there are other things that could be done here too!

Further Reading

  • I gave a talk about aegraphs at EGRAPHS 2023: slides, re-recorded video (the original was not recorded).

  • I gave a talk about aegraphs at the January 2026 Dagstuhl e-graphs seminar; the slides are a heavily updated and amended version of the 2023 talk, with the experiments/data I presented here.

  • There is a Cranelift RFC on aegraphs here, and one on ISLE (the rewrite DSL that we use to drive rewrites in the aegraph) here.

  • The main PR that implemented the current form of aegraphs is here, co-authored by my former colleague Jamey Sharp (this production implementation was a fantastically fun and productive pair-programming project!).

Acknowledgments

Thanks to many folks for discussion of the ideas around aegraphs through the years: Nick Fitzgerald, Jamey Sharp, Trevor Elliott, Max Willsey, Alexa VanHattum, Max Bernstein, and many others at the Dagstuhl e-graphs seminar. None of them reviewed this post (it had been languishing for too long already and I wanted to get it out) so all fault for any errors herein is solely my own!


1

Possibly with masking of the top bit if our IR semantics have defined wrapping/truncation behavior: x & 0x7fff..ffff.

liam_on_linux: (Default)
[personal profile] liam_on_linux
From Reddit I learn that a new generation of LLM bots is getting really really good at finding exploitable vulnerabilities in large C codebases, and making exploits for them.

Good.

Maybe it will result in the destruction of the entire C-based software industry before the LLM industry self-immolates. Slight snag: it may take human civilisation with it.

I am vaguely working towards some kind of overall Liam's Theory of Software thing in some of my recent Reg articles, like the "Starting Over" series about an Optane-based pure-object-storage-no-files OS, based on FOSDEM talks.

https://archive.fosdem.org/2018/schedule/event/alternative_histories/

https://archive.fosdem.org/2021/schedule/event/new_type_of_computer/

https://archive.fosdem.org/2024/schedule/event/fosdem-2024-3095-one-way-forward-finding-a-path-to-what-comes-after-unix/

... but it's not easy. Obviously the problem space is vast. That's one. Secondly, it'd help if I could find a way to do it iteratively. It's a big big and nebulous for me to grapple with while being a nearly-60-year-old-dad in an isolated country with nobody to bounce ideas off in person.

The other recent one that's relevant is this:

https://www.theregister.com/2026/02/08/waves_of_tech_bs/

Sketchy ideas as relevant here:

  • OSes are hard. (We all know this, right?)
  • Therefore you need to keep it simple. I mean you need to be insanely radically obsessive about extreme simplicity.
  • Ken Thompson realised this very early on. He merits more respect.
  • Dennis Ritchie saw the signs and jumped on board very early. He, I fear, gets more than he deserved.
  • They worked out...
  1. We need a tiny focused core OS. That's Unix v0 to v3.
  2. We need something tiny and simple to make it portable. That's C.

That got us to Unix v4, just rediscovered.

Unix was a good idea, but just one good idea in a space of good ideas. Key point:

  • It is not the alpha and omega. There are others. This is vital to remember. Much of the world knows nothing but Unix and thinks it's (insert christian metaphor about one truth here).

Unix grew to v9 or v10 -- can't remember, not well, don't want to go do a ton of research -- and several times industry took a snapshot and run, not realising it was unfinished.

Somewhere around 5-6-7 -- it gets confused -- we get everything that grew into the BSDs, System III, 4, V, all commercial Unixes, and then, a copy of a copy, Linux.

They are copies of an obsolete design, built for an obsolete type of computer nobody has any more. Copies of copies of copies of something obsolete.

  • Den & Ken went on to realise: "Hey, we don't have minicomputers, we have networked workstations."

The result was Plan 9.

Unix, but grown up. Much simpler, much cleaner, conceptually much harder on the fakers pretending to Know Computers. (I am one too.)


Parenthetical excerpt:

Forget terminals. Forget the terminal existed. Terminals are bad. Stop being obsessed with terminals. It is not about terminals.

Network at the core. Containers at the core. Everything is a container all the time. All namespaces (files, processes, PIDs, network addresses): they are all virtual. They must be. If your design does not allow that, throw it out.

Things that don't fit well, like legacy 20th century stuff like Linux, you stick in a VM and you don't emulate any hardware. Virtual drives for VMs? Stupid. Throw them out. Filesytem in a file on a filesystem? What are you, retarded? No!

Cut down Linux so the only hardware it can talk to are virtual network sockets, with the filesystem over 9p, display over X11, and run microVMs on demand for every big fat old Linux app you need.

I don't run Plan 9, because sadly, I need Firefox and Thunderbird and Ferdium and a bunch of bloated stuff like that, and they are to avoid SaaS and stuff.

By 2000 the entire FOSS Unix world had Linux and Plan 9 and VMs and Jails and it should have realised, hey, crap, the baseline has moved, we should move.

By 2006 or so, the baseline moved more: hardware virtualisation, lots of cores, 64-bit so lots of RAM.

By 15 years or so we should have had a modern 9front with integrated microVMs for those bloated GUI apps we all need.

Linux folks get Linux microVMs. xBSD folks get xBSD VMs for their native apps.


  • But Den & Ken didn't stop there.

Plan 9 was Unix done right, but in C. They tried Aleph but couldn't make it fly.

Snag: you compile to native binary code, then your process can't migrate around the cluster.

You know how all Arm boxes have bigLITTLE cores? x86 is getting on board? Well do it right and your little efficiency cores are Arms and the big fat performance cores are x86 and your binaries can't see the difference.

Next they did Inferno. Plan 9 with a better UI and CPU independence. Embed a very fast VM in the kernels, target that for everything not performance critical.

Great idea, but premature obsession with phones didn't help -- commercial, gotta find a market! -- and Java killed it.

Half-assed Linux misunderstandings: eBPF, WASM. They grope in the direction but are in the dark and don't know there is a road.

The real lesson: the people who invented C realised it was a profoundly flawed plan and gave it up.

What to learn: well, Rust is finally learning it but if you include the toolchain it's 1000x bigger and even the fans say it's complicated. The way to sanity is to make it smaller and simpler. They did the reverse.

Oberon is smaller and simpler than C and it's much more capable.

  • Inferno flopped. The team dispersed. The people that wrote Unix could not find a place in the Unix industry. This tells you how totally fscked the Unix industry is.

Some of the Inferno folks landed at Google. There they did Go.

I don't know much detail about this stuff but I suspect from the history that much of what Go does, it does right, and Rust probably does wrong.

But the latest facet of the Unix congenital insanity is "Go bad Rust good".

Oberon is just an example. No it's not a mistake that the OS and the language have the same name. That's like saying the problem with wheels is that they're round. The machine is flawed -- it keeps rolling away!

The core FOSS OS should be something that a smart kid can understand, top to bottom, read and follow every line. But it should also be so easy and colourful and pretty and fun that they'd want to.

Let's make a better modern 64-bit Oberon with elements of Go. Let's build a modern Inferno in it. Let's equip it with microVMs so all the legacy apps we all love, the broken bad ideas we all need, like the WWW and so on, can run on it. But if it's built in C or Rust, it's dangerous toxic waste and should be kept in an airtight box until it suffocates. We can make it work in the meantime though.

Take all the existing billion-line OSes and burn them to the ground. If aside from human language translation the only thing of lasting value to come from LLMs is destroying the C-based software industry, I'll buy that.

[syndicated profile] talks_cl_cam_feed

Fast and distributed FTQC with neutral atoms: transversal surface-code game and entanglement boosting protocol

Neutral atom technologies have opened the door to novel theoretical advances in surface-code protocols for fault-tolerant quantum computation (FTQC), offering a compelling alternative to lattice surgery by leveraging transversal gates. However, a crucial gap remains between the theory of FTQC and its practical realization on neutral atom systems; most critically, a key theoretical requirement—that syndrome extraction must be performed frequently enough to keep error accumulation below a threshold constant—is difficult to satisfy in a scalable manner in conventional zoned approach. In this work, we develop a comprehensive theoretical framework that closes such a gap, bridging theoretical advances in surface-code fault-tolerant protocols with capabilities of neutral atoms. Building on the “game of surface code” framework originally developed for superconducting qubits, we introduce an alternative game-based paradigm for transversal-gate FTQC that harnesses the unique strengths of neutral atom arrays. The game rules are designed to enable syndrome extraction at any intermediate step during logical gate implementation, ensuring compatibility with the threshold theorem. We further present an efficient method for designing resource state factories tailored to transversal-gate FTQC . As an application, our framework offers a systematic methodology and high-level abstraction for resource estimation and optimization, demonstrating that space-time performance competitive with a baseline lattice-surgery-based approach on superconducting qubits is possible, even when physical operations on neutral atoms are orders of magnitude slower. These results establish a solid foundation that bridges the theory and experiment of FTQC powered by neutral atoms, charting a well-founded pathway toward scalable, fault-tolerant quantum computers and setting practical directions for technological development.

Add to your calendar or Include in your list

[syndicated profile] domainincite_feed

Posted by Kevin Murphy

Amazon appears to have offloaded three of its dormant gTLDs to Identity Digital, judging by ICANN records. While no formal notices of registry contract reassignment have yet been posted, elsewhere ICANN shows the official registry for .circle, .got, and .jot is now Jolly Host LLC. Jolly Host is a new Identity Digital affiliate that appeared […]

The post Amazon sells three gTLDs to Identity Digital first appeared on Domain Incite.

[syndicated profile] firedrake_feed

2015 police procedural mystery, tartan noir, ninth in the Logan McRae series. As the rescue of a serial killer's victim involves enough procedural irregularity to lead to the killer's lawyer inducing reasonable doubt in the jury, Logan gets a "development opportunity" back in uniform, in rural Aberdeenshire.

[syndicated profile] charlesarthur_feed

Posted by charlesarthur


The agricultural machinery company John Deere is paying $99m to settle a right-to-repair class action in the US. CC-licensed photo by Lutz Blohm on Flickr.

You can sign up to receive each day’s Start Up post by email. You’ll need to click a confirmation link, so no spam.


A selection of 9 links for you. Unearthed. I’m @charlesarthur on Twitter. On Threads: charles_arthur. On Mastodon: https://newsie.social/@charlesarthur. On Bluesky: @charlesarthur.bsky.social. Observations and links welcome.


OpenAI shelves Stargate UK in blow to Britain’s AI ambitions • The Guardian

Aisha Down and Alexandra Topping:

»

OpenAI has put on hold plans for a landmark UK investment citing high energy costs and regulation, in a blow to the government which has put AI at the centre of its growth strategy.

Stargate UK was a part of the UK-US AI deal announced last September, in which US companies appeared to commit £31bn to the UK’s tech sector, part of a larger series of investments intended to “mainline AI” into the British economy.

It came as the Labour government seeks to make AI and datacentres the engine of its growth plans, alongside closer ties with Europe and regional growth.

“This is a wake-up call for the government to manage energy costs in the UK and foundation infrastructure,” said Victoria Collins MP, the Liberal Democrat spokesperson for science, innovation and technology. “We cannot be dependent on US tech companies to build our own sovereign capabilities – whether that’s energy cost, supply or even data and phone signal.”

The Labour MP Clive Lewis said: “When a government has no economic strategy worthy of the name and no real industrial vision, it becomes vulnerable. The Silicon Valley companies that flew into London knew exactly what they were dealing with: a prime minister and a technology secretary desperate to project momentum, willing to dress up press releases as policy.”

A Guardian investigation last month revealed many of the deals to “mainline AI into the veins” of the British economy were “phantom investments”, and a supercomputer scheduled to go live in 2026 was last month still a scaffolding yard in Essex. That supercomputer was to be built by Nscale, a UK firm that had never built a datacentre before but said it was aiming to deliver the project in 2027. Nscale was also to build key datacentres for Stargate UK.

The Stargate project was to support Britain in building out “sovereign compute” – infrastructure that would allow the government and other UK institutions to run AI models on datacentres in the country. That is, in theory, crucial to the security of British data.

Now, OpenAI has apparently put it on pause, saying it would wait for “the right conditions” to enable “long-term infrastructure investment”.

«

Not a real surprise; despite getting billions of dollars of investment, OpenAI’s profitability is about as accessible as the moon. So it’s going to cut back where it’s easiest to cut. Sora was just the first one; this is the next; there will be others over the next few months. The splurge on the podcast company was a few million, and that’s just everyday.
unique link to this extract


Oil industry pleads its Hormuz case with White House • POLITICO

Ben Lefebvre and Phelim Kine:

»

Oil company executives are reaching out to the White House, Secretary of State Marco Rubio and Vice President JD Vance to protest allowing Iran to charge tolls through the strategic Strait of Hormuz as a condition of peace talks, said one industry consultant granted anonymity to discuss relations with the administration.

“Hell yes,” this person said when asked if executives were contacting the White House to protest a toll on Hormuz. ”We didn’t have to do that before — and I thought we won the war. Any place you have access to the administration, you ask, what are you guys thinking?”

The response administrative officials were giving industry representatives “is not a cold shoulder,” this person added. “It’s more like, ‘Yeah, OK, we’ll take note.’”

Oil industry representatives met with senior administration staff in the State Department on Wednesday morning to raise concerns, said one person who said they attended the meeting.

Among their points: conceding to Iran’s request would add $2.5m to each shipment in tolls and higher insurance rates, a cost that would be passed on to consumers. Giving Iran control of Hormuz could set precedent for countries like Singapore and Turkey to charge tolls on important trade routes on the straits of Malacca and Bosporus. And paying the toll could put companies in legal jeopardy for violating sanctions on Iranian officials.

Companies were also expressing their concerns directly with Trump, but more gently, added this person, who was granted anonymity because they were not authorized to talk to the media.

“The president is extremely sensitive to the legacy and judgment on the success of this war so pushing the president right now is seen as a risky proposition,” this person said. “But the White House is hearing from the industry despite the gingerness of the conversations.”

«

The tolls/tariffs equate to a carbon tax of about $2.50 per ton – not very much in the grand scheme of things ($40 to $50 would be more useful). But it’s a start. The concern about other countries starting to charge similar tolls is very real, though. A few countries near narrow navigational spaces might find a sudden interest in exacting high charges for pilots of ships.
unique link to this extract


Banksy, Satoshi and the unmasking impulse • On my Om

Om Malik:

»

I am a big believer in accountability journalism. It unmasks wrongdoing. It exposes the powerful who hide behind institutions to avoid consequences. That’s a clear and defensible public interest. This is not accountability journalism, by any stretch of the imagination.

Banksy and Satoshi weren’t hiding wrongdoing. They were hiding themselves. In Banksy’s case, the anonymity IS the art. The whole point is that the work speaks without the person. The art appears without permission, without attribution, without a market position or a gallery or a brand to protect. That’s not incidental to its power. It is its power. The work on the wall speaks precisely because there is no face behind it available for interview.

With Satoshi, the anonymity IS the architecture. Bitcoin was designed to be leaderless. An identifiable founder is a vulnerability. Someone governments can pressure, someone courts can compel, someone bad actors can target. The anonymity wasn’t ego protection. It was architecture.

Unmasking either one isn’t just invasive. It is destructive to what they built.

«

And speaking of that “unmasking” of Satoshi…
unique link to this extract


Our quest not to solve bitcoin’s great mystery • FT Alphaville

Bryce Elder:

»

One morning in the spring of 2026, FT Alphaville was sitting in traffic on the A40 eastbound when, tired of starting posts in the normal way, we switched to a drop intro.

The post we needed to start was about the alleged unmasking of bitcoin’s pseudonymous inventor, Satoshi Nakamoto. Alphaville has long considered the question of Satoshi’s true identity to be one of the least important enigmas of our age, having poked at it before with some success.

Hearing once again that a media organisation was claiming to have doxxed the person who spawned a multi-hundred-dollar speculative reporting industry had aroused in us a mixture of weariness and weariness. Which fiftysomething male fringe academic would it be this time?

The Japanese one? The other Japanese one? The drug dealer? The dead one or the other dead one? The one with a beard, or the one without a beard, or the other one with a beard, or the other one without a beard? The liar? The other liar? Or maybe it was a hive mind of these and other fiftysomething male fringe academics, such as this one, or this one?

It was the one with the beard.

«

Stellar piece of fun by the Alphaville team, whose work is always free to read. It’s a very comprehensive debunking of the idea that Adam Back is Satoshi.
unique link to this extract


Scoop: Meta removes ads for social media addiction litigation • Axios

Dan Primack:

»

Meta on Thursday began removing advertisements from attorneys who were seeking clients that claim to have been harmed by social media while under the age of 18.

This comes just two weeks after Meta and YouTube were found negligent in a landmark California case about social media addiction. Lawyers across the country now are seeking new plaintiffs, in the hopes of bringing a class action lawsuit that could result in lucrative verdicts.

It’s unclear if any of them are being backed by private equity, as the California lawsuit appears to have been.

Axios has identified more than a dozen such ads that were deactivated today, some of which came from large national firms like Morgan & Morgan and Sokolove Law. Almost all of them ran on both Facebook and Instagram. Some also appeared on Threads and Messenger, plus Meta’s Audience Network — which distributes ads to thousands of third-party sites.

One such ad read: “Anxiety. Depression. Withdrawal. Self-harm. These aren’t just teenage phases — they’re symptoms linked to social media addiction in children. Platforms knew this and kept targeting kids anyway.” A few of the ads still remain active, including some that were posted earlier today.

Meta appears to be relying on part of its terms of service that say:

»

“We also can remove or restrict access to content, features, services, or information if we determine that doing so is reasonably necessary to avoid or mitigate misuse of our services or adverse legal or regulatory impacts to Meta.”

«

«

That’s something of an admission after the lawsuits, though unfortunately – as the article points out – entirely within its ToS.
unique link to this extract


Do links hurt news publishers on Twitter? Our analysis suggests yes • Nieman Journalism Lab

Laura Hazard Owen:

»

Elon Musk has said as much: Links in tweets are bad for engagement. Over the last few days, sparked by a post from Nate Silver, people have started arguing again about the relationships between links and engagement. But our new analysis of thousands of tweets from 18 publishers makes it pretty clear: Links do seem to hurt news publishers on X/Twitter.

Back in 2016, the analytics company Parse.ly published a report: “Does Twitter matter for news sites?“

The report found that Twitter drove little traffic to most news sites, generating only around 1.5% of most publishers’ traffic. But, the authors wrote, “Twitter excels at both conversational and breaking news…Though Twitter may not be a huge overall source of traffic to news websites relative to Facebook and Google, it serves a unique place in the link economy. News really does ‘start’ on Twitter.”

…I used Claude to help me scrape the 200 most recent tweets from 18 large publishers’ X accounts and track the engagement (likes + comments + retweets) on each. Six of those publishers have paywalls: Bloomberg, CNN, Forbes, The New York Times, The Wall Street Journal, and The Washington Post. Nine don’t: Al Jazeera English, AP, BBC1, Breitbart News, CBS News, Daily Wire, Fox News, NBC News, and Reuters. The last three accounts I looked at — Leading Report, unusual_whales, and Globe Eye News — are not news publishers, but aggregate breaking news in tweets without links. (Here, for example, is an example of a Leading Report tweet: “BREAKING: Iran has halted direct talks with the US, per WSJ.” They’re sometimes referred to as engagement-maxing accounts.

These charts make it pretty clear that links in tweets hurt engagement. The connection was so apparent in my analysis that a graph including all 18 publishers is almost unreadable: The traditional, link-loving publishers are clustered in the bottom left corner (lots of links, little engagement) in a nearly indistinguishable mass of bubbles, no matter how large their followings are.

«

The ones who succeed in getting “engagement” – likes, reposts, comments and replies – are the ones which distil those news orgs’ content and put a slant on them, or “vaguepost” about them. That gets people worked up. The problem is that the algorithm thinks clicking on links isn’t engagement, and reduces the visibility of those accounts, even when they have millions of followers. The problem, therefore, is in how the algorithm measures “engagement”.
unique link to this extract


Movements need the critical thinking that AI destroys • Jacobin

Florian Maiwald:

»

The negative side effects accompanying the use of large language models (LLMs) are vividly illustrated by the phenomenon of “cognitive debt.” From an economic perspective, the short-term productivity gains achieved through the use of AI systems are difficult to dispute. By delegating numerous tasks previously performed by humans to AI, significant efficiency gains can be observed: workflows are accelerated, processes are rationalized, and organizational routines are overall made more efficient.

Yet the resilience and efficiency generated through delegation to AI systems could threaten a gradual loss of the cognitive capacities that are being outsourced to them. A recent MIT study that found significantly reduced brain activity among regular users of chatbots, for instance, provides some initial support for this worry.

While debates about the threat modern AI corporations pose to democracy tend to focus on the fact that data (and thus control over algorithms) are increasingly concentrated in the hands of major tech companies that largely avoid public oversight, another important question is surprisingly often pushed into the background. It is a question about the preconditions for people to be able to take part in democratic processes and emancipatory political projects.

The outsourcing of thinking is, of course, not a new phenomenon. It is the main theme, in fact, of Immanuel Kant’s classic 1784 essay, “What Is Enlightenment?” For Kant, the process of emancipation consists in freeing oneself from the “self-incurred immaturity” of letting others think for you and instead making use of one’s own powers of reasoning. He writes:

»

It is so convenient to be immature. If I have a book that has understanding for me, a pastor who has a conscience for me, a physician who judges my diet for me, and so forth, then I need not trouble myself at all. I have no need to think if only I can pay; others will readily undertake the disagreeable business for me.

«

«

This makes me think that this complaint/debate has been going for a long time. The move from oral longform poetry such as The Iliad and Beowulf to writing it down, then printing it, then putting it on websites, then letting search engines find it for you, and now letting LLMs do some part of the work of analysing it – all of these seem to have been viewed as letting our brains slide back into the primordial ooze. If a problem is eternal, is it really because of the tools, or the toolmakers?
unique link to this extract


John Deere to pay $99m in monumental right-to-repair settlement • The Drive

Caleb Jacobs:

»

Farmers have been fighting John Deere for years over the right to repair their equipment, and this week, they finally reached a landmark settlement.

While the agricultural manufacturing giant pointed out in a statement that this is no admission of wrongdoing, it agreed to pay $99m into a fund for farms and individuals who participated in a class action lawsuit. Specifically, that money is available to those involved who paid John Deere’s authorized dealers for large equipment repairs from January 2018. This means that plaintiffs will recover somewhere between 26% and 53% of overcharge damages, according to one of the court documents—far beyond the typical amount, which lands between 5% and 15%.

The settlement also includes an agreement by Deere to provide “the digital tools ​required for the maintenance, diagnosis, and repair” of tractors, combines, and other machinery for 10 years. That part is crucial, as farmers previously resorted to hacking their own equipment’s software just to get it up and running again. John Deere signed a memorandum of understanding in 2023 that partially addressed those concerns, providing third parties with the technology to diagnose and repair, as long as its intellectual property was safeguarded. Monday’s settlement seems to represent a much stronger (and legally binding) step forward.

Ripple effects of this battle have been felt far beyond the sales floors at John Deere dealers, as the price of used equipment skyrocketed in response to the infamous service difficulties. Even when the cost of older tractors doubled, farmers reasoned that they were still worth it because repairs were simpler and downtime was minimized: $60,000 for a 40-year-old machine became the norm.

«

This is epochal: John Deere was notorious for years for locking down machines to prevent user repair, and farmers detest not being able to do things for themselves. The only surprise is it took this long.
unique link to this extract


The CIA “Ghost Murmur” story is probably bullshit • The After-Action Report

Seth Hettena:

»

I’m no expert in this field, but Quantum Insider, which tracks these developments, pointed to several studies that show the limits of this technology.

One study published this year on diamond quantum magnetometry, the same technology Ghost Murmur supposedly uses, required sensors placed 1 centimeter from the chest inside a magnetically shielded room and an average of up to 12,000 heartbeats to detect a signal.

“Averaging was necessary since magnetic field recordings did not reveal the MCG signal in the NV trace in real-time,” the study reported.

In plain English: The quantum sensor could not detect a heartbeat in real time in a shielded room at one centimetre.

A 2024 study detected the heartbeat of an anesthetized rat, a weaker signal than a human heart, using a sensor placed 5 millimeters from the animal’s chest, inside a magnetic shielding cylinder, after an hour of continuous data accumulation.

Ghost Murmur supposedly detected a single beating heart, in real time, from 40 miles [65km] away, over open desert, from a moving aircraft, in an environment saturated with competing signals from the Earth’s magnetic field, electronic devices, and other living creatures. Not likely.

Even the military’s own research agency says the technology isn’t ready. In August 2025, DARPA launched Robust Quantum Sensors to address the fact that quantum sensors remain “notoriously fragile in real-world environments” where “even minor vibrations or electromagnetic interference can degrade performance.” The program’s Phase 1 goal is modest: just keep a quantum sensor functioning during a helicopter flight. “That’s it. That’s it,” the program manager told contractors at a briefing. Ghost Murmur supposedly cleared that bar and detected a heartbeat from 40 miles away, eight months later.

Interesting Engineering pointed out that similar magnetic-sensing techniques have been used for submarine detection. But that isn’t the same challenge as detecting a heartbeat. A submarine is a massive steel object, and magnetic submarine detection works by sensing how thousands of tons of steel distort the Earth’s existing magnetic field. That’s a completely different problem from trying to detect a 25 picotesla heartbeat across miles of open desert.

The problem is the laws of physics.

«

Oh, those damn laws. Don’t worry, Trump ignores them. I did think it sounded far-fetched but this is more solid.
unique link to this extract


• Why do social networks drive us a little mad?
• Why does angry content seem to dominate what we see?
• How much of a role do algorithms play in affecting what we see and do online?
• What can we do about it?
• Did Facebook have any inkling of what was coming in Myanmar in 2016?

Read Social Warming, my latest book, and find answers – and more.


Errata, corrigenda and ai no corrida: none notified

[syndicated profile] cks_techblog_feed

Posted by cks

Nftables is the current generation Linux firewall rule system, supplanting iptables (which supplanted ipchains). As covered in the nft manual page, nftables has the concept of 'symbolic variables'. Since I'm used to BSD PF, I will crudely describe these as a combination of some parts of pf tables and PF macros. I personally feel that the nft manual page doesn't do a good job of documenting what's possible in these, so here are some notes.

The simple case is simple values:

define tundev = "tun0";
define outdev = "eno1";
define natip = 128.100.x.y
define tunnet = 172.29.0.0/16

(It turns out that the ';' here is decorative and I put it in out of superstition, judging from actually reading the "Lexical Conventions" section.)

I'm not sure of the rules of when you have to quote things and when you don't. As covered in the manual page, you use these symbolic values in the relevant nftables bits, for example a SNAT rule:

ip saddr $tunnet oifname $outdev counter snat to $natip;

Nftables also has the concept of 'anonymous sets', which are written in the obvious PF-like syntax of '{ ..., ..., ... }'. You can use symbolic variables to define anonymous sets, and if you do they can span multiple lines and have embedded comments, and of course you can have multiple elements on one line (not shown in this example):

define allowed_udp_ports = {
        # DNS
        53,
        # NTP
        123,
        # for HTTP/3 aka QUIC
        443
}

(I suspect that symbolic values written directly in nftables rules can also span multiple lines and have embedded comments, but I haven't checked.)

A comma on the last entry is optional. Unlike in BSD PF, elements must be separated by commas.

You can use this to define port numbers, IP address ranges, and no doubt other things. However, I don't know how efficient it is if you're defining large numbers of things, and of course you can't update your defined things without reloading your entire ruleset. If you need either of features, you're going to have to figure out named nftables sets or maps.

There's no direct equivalent of the BSD PF syntax for defining a table from a file with eg 'table <SSH_IN> persist file "/etc/pf/SSH-ALLOWED"'. The closest you can come is to define an anonymous set in a file you 'include' in your nftables rules.

(I believe this is also the best you can do for loading named sets and maps from files.)

PS: Apparently there are also anonymous maps, to go with named ones.

Sidebar: Named sets in nftables

Since I just worked this out, well, found an example, here is how you write a set in your nftables.conf:

table inet filter {
    set allowed_tcp_ports {
       typeof tcp dport
       elements = { 22, 25, 80, 443 }
    }

    chain input {
       [...]
       meta iifname $outdev tcp dport @allowed_tcp_ports counter accept;
[...]

Now that I understand the use of 'typeof', I'll probably use it for all sets and maps rather than trying to look up the specific type involved (although nft can help with that with 'nft describe').

Friday 10 April, 2026

2026-04-09 23:56
[syndicated profile] john_naughton_feed

Posted by jjn1

Stairway to…

… who knows…


Quote of the Day

”No solution to the problem of poverty is so effective as providing income to the poor. Whether in the form of food, housing, health services, education or money, income is an excellent antidote for deprivation. No truth has spawned so much ingenious evasion.”

  • John Kenneth Galbraith

Musical alternative to the morning’s radio news

Albinoni | Adagio in G Minor for solo cello and cello quartet

Link

See below for why I chose this for today.


Long Read of the Day

Use it or lose it

Margaret Heffernan takes on the challenge of AI.

I used to carry about 100 phone numbers in my head: family, friends, my office. Now I carry just one: my husband’s. I can do this because I have effectively outsourced my memory to my phone. I’m comfortable doing so because I backup my phone and if, in a real emergency, I don’t have it, I can call my husband.

Other kinds of outsourcing I am not happy with. No matter how well AI agents might get to ‘know’ me, I wouldn’t want it to email my friends or fix a time I can have lunch with my daughter; I’d regard it as crass and impersonal—because it would be crass and impersonal. In the same way, podcasters who invite me onto their show, and then invite me to schedule myself don’t get a reply: I have zero appetite for so transactional an exchange.

The smarmy automated invitations to save me time by letting AI summarize long documents (wow at least 10 pages!) just make me mad at the addiction-peddling of Microsoft, a company that, if it truly respected my time, would devote real resources to improving their messy software…

I’m with her on Microsoft software.


The new Ofcom chair’s first task is to tame Elon Musk

My OpEd in last Sunday’s Observer

The former television mogul Michael Grade comes to the end of his four-year term as chairman of Ofcom, the UK’s media regulator, later this month. In classic British fashion, the government compiled a shortlist of possible successors drawn straight from central casting: a doughty baroness and a brace of knights. The baroness, Margaret Hodge, was thought by some to be too ancient (though she is of the same vintage as Lord Grade). One of the knights, Jeremy Wright, a Tory MP with a real track record of caring deeply about online harms, was thought to be deemed too dangerous for a timid government. So the position appears to have gone to a City grandee, Ian Cheshire, who has spent his life running big retailers and sitting on the boards of Barclays, Debenhams and BT.

As far as I can see, his only obvious qualification for running a media regulator was that he had once been chairman of Channel 4. Presumably, he at least possesses those accessories so prized by the British establishment: a “safe pair of hands”. Which is good, because he’ll need them.

He inherits a powerful agency that has, since its foundation in 2003, been attracting commitments the way a trawler attracts barnacles…

Do read on


Books, etc.

Screenshot

Just downloaded this on the recommendation of a colleague because it touches on things I’m trying to think about at the moment.


My commonplace booklet

RIP a fine journalist

Yesterday, I went to the funeral of a fine journalist in our local church. Burns won two Pulitzer Prizes and a host of other awards during his 40-year career at the New York Times. One of the Pulitzers was for his reporting of the destruction of Sarajevo and the barbaric killings in the war in Bosnia-Herzegovina in 1992.

The music at the service was comforting in the Anglican tradition, with Abide with Me, Guide me, O thou great redeemer and Jerusalem (an echo of the last night of the Proms), but the really heart-stopping moment was when a young cellist, Santi Lowe, played Albinoni’s Adagio in G minor, as a tribute to Vedran Smailovic, a musician whom John had immortalised in his reportage. Here’s what he wrote:

A Cellist Honors Sarajevo’s Dead

As the 155-millimeter howitzer shells whistled down on this crumbling city today, exploding thunderously into buildings all around, a disheveled, stubble-bearded man in formal evening attire unfolded a plastic chair in the middle of Vase Miskina Street. He lifted his cello from its case and began playing Albinoni’s Adagio.

There were only two people to hear him, and both fled, dodging from doorway to doorway, before the performance ended.

Each day at 4 p.m., the cellist, Vedran Smailovic, walks to the same spot on the pedestrian mall for a concert in honor of Sarajevo’s dead.

The spot he has chosen is outside the bakery where several high-explosive rounds struck a bread line 12 days ago, killing 22 people and wounding more than 100. If he holds to his plan, there will be 22 performances before his gesture has run its course.

I really love the cello. And the resonances it triggers when played in a small country church are breathtaking. Reminds one of those recordings of Pablo Casals playing the Bach Cello suites in the crypt of Sant Miquel de Cuixà, France, in 1950.

The New York Times obit of John Burns is here.


  This Blog is also available as an email three days a week. If you think that might suit you better, why not subscribe? One email on Mondays, Wednesdays and Fridays delivered to your inbox at 5am UK time. It’s free, and you can always unsubscribe if you conclude your inbox is full enough already!


March 2026

S M T W T F S
1234567
8910111213 14
15161718192021
22232425262728
293031    

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags

No cut tags
Page generated 2026-04-11 14:15
Powered by Dreamwidth Studios