18 Comments
User's avatar
rxc's avatar
Oct 17Edited

I don't understand why AI developers seem to think that they should be held harmless if their products make recommendations that result in harm. Right now, engineering software companies can be held liable if they produce calculations that rresult in a bridge falling down or an airplane falling out of the sky, so why not AI? Of course, there should be an engineer held responsible for these events, too, whenever a calculation requires a "stamped" document from an engineer, but if the engineer cannot understand why the AI provided the information that he/she relied on to stamp the plans, they why should they use AI in the first place?

I see this rush to AI resulting in corruption of the database of information that undergirds our societies, as they create documents that are the products of probabilities of strings of words, rather than actual calculations that can be traced and understood. Engineers deal with uncertainties all the time, by adding margin to designs. How much margin are AIs going to add to engineering calculations that they provide to the customers?

Who, exactly, should be held accountable when a bridge falls down because of an AI recommendation that did not have sufficient margin? The AI company that trained it? The AI itself? The company that bought the AI results and used them? The customer who build the bridge? No one seems to be thinking about real world problems, but there is a lot of wallowing in vague terms like "alignment" or "mundane utility" and "hallucination". Thes are the language of lawyers and poets and social scientists.

Gerald Monroe's avatar

Because

(1) AI is not currently remotely close to that good and won't be for years.

(2) Why should society make illegal this tool during the phase when it's mediocre. It just needs to be warned about and almost every AI tool has bold text "AI is experimental and make mistakes" prominently in the gui.

(3) Do you apply that standard to other tools. A blowtorch has an open flame! A chainsaw an open blade. None of these tools are remotely safe, and can't be made safe without decades of further improvement. (Basically you would need to invent robots to carry such tools to make them safe)

(4) (The actual reason) Rival nations will not be restricting their AI chatbot or allowing people to sue, at all. So your proposal gives them all of the benefits.

rxc's avatar

I am not talking about hand tools. I am talking about tools that are used in safety critical applications. There are a LOT of these tools out there, and they have to comply with federal and state regulations regarding the quality of the products for which they are used. I am not talking about word processors, but tools that have been used to prepare legal cases that are submitted to courts, and technical papers submitted for publication, which will eventually be cited to create new chemicals. AIs are being used in these specific applications, and they are producing documents that contain not just inaccuracies, but real fabrications.

I have 40 years of experience in the nuclear business, and I have seen advertisements from the Argonne National Laboratory touting its expertise using AI to design, manufacture, build, and operate nuclear power plants. This scares me to no end.

I was one of the main reviewers at the NRC of several advanced reactors, and the leader of the technical review of computer programs used to approve several of them. Right now, if I was still working, I would tell the reviewers that if there was ANYTHING in any application that even hinted that an AI was used to prepare the application, EVERY SENTENCE in the application would need to be independently verified, back to the cited references, and an individual written approval would be required for EVERY sentence, number, graph, drawing, and calculation before anything could be approved.

I think that the airplane people have similar standards, especially after the hard times Boeing has been going through regarding quality assurance.

Your comment that AI is not yet ready for real world application is correct. My problem is that there are too many people getting ahead of the tech and they are polluting the documentation of our civilization by creating documents that contain lies, da**ed lies, and fabricated statistics, and which will be used by people who do not know where they came from.

vectro's avatar

> engineering software companies can be held liable if they produce calculations that [result] in a bridge falling down or an airplane falling out of the sky

Is this true? Could you provide a citation or example?

The closest thing I could find is AA 965, but the software involved there was not engineering software but rather navigational software.

rxc's avatar

You should start with 10CFR 50 Appendix B, which is the QA standard for nuclear power plants. It covers software and hardware. It is very similar to ISO9000, but with subtle differences that cause religious-style disagreements between the two communities of QA people. And then you should go look for the "Code scaling, applicability, and uncertainty (CSAU) methodology", which was originally developed to evaluate thermal-hydraulic analytical calculation uncertainty.

Tradionally, engineers have not explicitly evaluated analytical uncertainty, but instead used consensus standards (e.g. Building codes and other standards) to tell them how to build stuff. If the codes do not cover the application, you have to perform more detailed calculations and then deal with the uncertainties, somehow. Most of the time, you just add margin (aka safety factors) like thicker metal, lower pressures, additional space, more limited operational parameters) to account for the uncertainties. But since the financial people don't like to pay for margin, they want everything to be optimized and margins cut to save money. Airplance probably have the lowest margins of any engineered structure, because they would not be able to get off the ground if they were built the way bridges and reactors are built.

So, now we have probablistic risk assessments, backed up by "best estimate" calculations. All of which still have a bit of black art in them, becuase it is very difficult to model systems with coupled thermal-hydraulic phenomena represented by non-linear partial differential equations that cannot be solved explicitly.

As an example of engineering being held to these standards, I would point to the demise of the Yankee Atomic Plant and the Maine Yankee atomic plant, which were supported by calulations done by the Yankee Atomic organization useing a computer model named RELAP5-YA. This was a code that was developed by the government, for its own use, but adopted by Yankee. They were told that it was not wise to try this, but they insisted, and they got political support for an approval. Which turned out to be less than wise, because it turned out that they were being "creative" in its application, and the results were that the two plants that used the model could not support continued operation, so they died. There are now grassy fields where they used to be (plus spent fuel storage facilities)

I could also point to a bridge down here in Florida that was designed by a group of students who did not know what they were doing, and which fell down while it was under construction. It landed on a busy road and killed 7 people, I think. The NTSB report for this accident is extraordinary for what it did NOT say about the students and their "analyses" of the structure. The university that sponsored it paid off all of the victims, and no one talks about it.

Or you could ask Boeing about the software they installed on the 737MAX, but did not tell the airlines or their pilots about, which caused 2 airplanes to fall out of the sky.

Are you an engineer? Have you ever had to design or engineer or review something that was "safety critical"? Have you ever had to have insurance to cover the consequences of your software?

avalancheGenesis's avatar

Just replying to express appreciation for sharing domain knowledge, with something more substantial than a "like". I too worry about the ongoing conflation of The Thing and The Symbolic Representation Of The Thing.

rxc's avatar

I can provide you with two other examples, which did not cause any damage - just a lot of money that had to be spent to fix "creative" calculations. One involves a power uprate at a plant in the midwest which started well, but during the first refueling after the uprate they opened up the vessel and found that the steam dryer structure had failed. Very large, heavy dumb piece of stainless steel. But important if you don't want to have water droplets impinge on your expensive steam turbine. Not really a safety matter, but still, something that had to be considered. The dryer had been in the plant for a very long time, and no one thought it would have any problems with the increased flow rate because other plants had done similar uprates and had no problems.. Turned out that increased steam flow up thru the dryer, and then out the pipes, had caused resonant vibrations that caused cracks and failure.

The manufacturer of the equipment (and the company responsible for selling the power uprate as a package), decided to use a wonderful new computer model that it said could predict the vibrations. They advised the owner to pay them to weld a few new plates over the area, and all would be well. I was not responsible for the safety of this component (bureaucracies can be very bureaucratic), but was invited to participate in the discussions. My recommendation was that they should just remove the old steam dryer and replace it with something new that was specifically designed and built for the new conditions.

No, too expensive. The bureaucrats who were responsibile for approving this activity thought that the new computer model sounded spiffy. I did not, because of experience evaluating similar computer models for other applications. So, they installed the patches, installed sensors on the vessel and piping, started it up, and listened for "interesting sounds". Which appeared pretty soon. Reactor shutdown, head removed, and lo and behold, the patches had failed.

New analyses, performed by lots of highly paid engineers with advanced degrees and Professional Engineering credentials. "We now have lots of good data to understand what is going on, so we are SURE that we just need to add a few stiffeners in a few places, and all will be well." Responsibile bureaucrats who have no experience with these computer models nod head and say "go ahead". The owner nods his head and gets out his checkbook. I disagree, saying that there is no way that they understand what is going on and the models are crap. More welding ensues, plant restarts, and listening to the vessel restarts.

Shortly, more "interesting sounds" come back. Shutdown the plant (very costly to do this), pull the head, and now the stiffeners are cracked, too.

I laugh, and walk out of the room, saying nothing. Owner decides to order new steam dryers, designed from scratch to handle steam flow, no more problems. Manufacturer probably lost all of their profit on the original uprate paying for analyses, failed mods, and downtime from failures,, especially several unnecessary shutdowns.

More in next post.

rxc's avatar

Same manufacturer. Developed a new analytical methodology (a suite of computer models and rules for using them) and a new design of reactor fuel that allowed about 35 power plants to raise their licenses power levels by 20%. This is VERY non-trivial, because there are a LOT of things that have to be changed to turn 20% more steam into electricity, but the manufacturer did the work, and we reviewed it, over a period of about 2 years, and determined that it was acceptable. We issue a Topical Report that endorses the methodology, so that it can be used by all 35 of these plants to upgrade their output. It essentially created about 7 "new" power plants, virtually. Not cheap, but certainly worth the effort. And really a good idea.

About 4 plants buy the methodology and perform the work needed to do the uprate, and all is well. Then one of the major owners of three of this type of plants decides to be really ambitious and combine the power uprates for two of the plants with license renewals for all three (20 more years), AND a restart of the third one,, which had been sitting in mothballs after an unfortunate fire many years before.

Ambitious, but not undoable. Each of the parts is well understood by the bureaucrats and there are precedents for them. They start the work while we prepare the final safety evaluation, and there is a grand meeting to celebrate all the work that has been done. In a big conference room with a big table and lots of people sitting around the periphery. The final report to the board of directors of the company which will be used to justify spending a LOT of money on this project is sitting there on the table, and everyone is talking about how it will work.

When everyone else stops talking, I ask one final question. One that is absolutely innocuous and just requires a simple numerical answer, What I hear is someone sitting behind me say, out loud in this big conference room where everyone can hear, "I guess we have to tell them". The hair on the back of my neck stood up when I heard this. WTF???

The explanation was that I had asked about the types of fuel that would be used in the plants after all this happened, and it turned out that they were ALSO going to do all of this work TOGETHER WITH A FUEL SUPPLIER TRANSFER. Meaning that part of the fuel that would be in the vessel would be supplied by a company different from the one that developed the methodology, and which was used as the basis for our safety evaluation.

"Hmmm???" I was responsible for approving this part of the project. Fuel supplier transfers are usually pretty routine, but there was no mention of this possibility in the 20% power uprate methodology. Usually there are a number of issues that have to be explicitly sorted thru as a plant shifts from one fuel supplier to another, and NOTHING HAD BEEN SAID ABOUT THIS to the regulator.

The meeting broke up at this point, and we had a significant in-house meeting with the technical people who said this was not what they had considered in their approval. So, I, personally, told my management that the Topical Report approving the methodology had to be withdrawn until we figured out how to deal with this. No one could remember the last time we had withdrawn approval for a major topical report like this, especially one that involved about 35 plants and a LOT of money.

I spent the next three months arguing with manufacturer about the issue, but we were adamant that they could NOT use the methodology for this application. The manufacturer and the other fuel vendor had to spend a considerable amount of money to provide detailed analyses of how a power uprate could be done at the same time as a fuel transfer. It cost everyone (except us) a lot of money.

One of the other plants that had been planning to do this exact same trick was really unhappy and they complained (in an email) to the manufacturer representative, who told them (in an email) that they were going to talk to my management and make them make me change my mind. Unfortunately, my management was involved in a very delicate political dance as a result of an unfortunate discovery of a significant hole in the top of a reactor vessel head at a plant in Ohio, and they were in no position to muscle anyone to change a safety decision. The Inspector General got involved in this, eventually, when the emails became public. It was not pleasant, but I had nothing to worry about, because I had done the right thing.

So, in the end, it all got done, and the three reactors are running at 120% of their original licensed power level.

I have two more of these stories to tell you about computer models, if you are interested. One about another power uprate, and one about using one computer model to "generate data" to validate a different computer model.

Joanny Raby's avatar

Could you double check your table of content links before posting? They rarely work.

jmtpr's avatar

Thank you very, very much for writing about your political strategy re:the general populace. I have a few comments:

1. I think you're overly fixating on points where the populace is wrong, e.g. water use, at the expense of points where the populace is mostly right. The gap between "I'm worried about AI taking my job" and "I'm worried about AI taking every job" is not large. These points of shared correctness are the real political opportunities.

2. I suspect the median voter is in many ways more reasonable than David Sacks, if only because the median voter is not an actively malicious liar.

3. You manage to find a lot of time to engage with David Sacks. I understand why, and I agree you should -- he's an important decision maker. But it's obvious that the mass popular opposition to AI is also going to play an important role in decision making. So you should be more engaged with popular opinion, and try to take it seriously, for the same reason you engage with David Sacks.

4. I think you overestimate how much influence your "faction" has had. This is not snark, I'm asking you to realistically assess whether your influencing strategy is a good one. It seems to me that you've been very successful at talking to powerful people, and not very successful at occupying positions of power. The actual decisions are being made by CEOs and politicians, and you are just showing up in their Twitter feeds and at their parties. Is that influence? Don't you worry you are just their entertainment? Maybe you should try to be a politician instead.

I think the universe has gifted you the rare opportunity of a near-complete bipartisan political alignment that your opposition is ill-positioned to take advantage of. David Sacks is certainly not going to step into that role -- as you noted, what he's saying completely contradicts what every voter actually believes. So I just don't understand why you are sitting around worrying what will happen when the Democrats or whomever eventually harness this popular energy. Take some initiative, you can do a lot more than just being an advisor to the powerful.

Jeffrey Soreff's avatar

Jack Clark:

"Generally, frontier AI development would benefit from more transparency and this is best handled federally. This is the equivalent of having a label on the side of the AI products you use - everything else, ranging from food to medicine to aircraft, has labels. Why not AI?

Getting this right lets us help the industry succeed and reduces the likelihood of a reactive, restrictive regulatory approach as unfortunately happened with the nuclear industry."

Agreed. Personally, I too do NOT want the equivalent of what the NRC + activists + the ALARA criterion did to the nuclear industry.

Zvi, you may be interested in support for your "security is capability" from an unlikely quarter (more precisely, for the alignment to users is capability part of your claim). In David Shapiro's (who I think of as primarily an AI cheerleader) most recent YouTube post, https://www.youtube.com/watch?v=NakCdHb4tYs, "OpenAI has lost the Mandate", saying "Anthropic is eating their [OpenAI's] lunch in enterprise. ... [7:02] Enterprises, government and military, they want your AI to be extremely reliable. And what does that mean? Safe. ... which is exactly what Anthropic is researching."

EDIT: Obvious disclosure: The part of "security is capability" that Shapiro agrees with, and that he views as dovetailing well with enterprise needs, is also the part that I, personally, agree with.

Anthony Bailey's avatar

> On the contrary, as I remember it, early AI safety advocates were highly willing to discuss extreme interventions and scenarios, to take ideas like this seriously, and to consider that they might be necessary.

That's really interesting. Any key posts to cite?

To a relative newcomer (read Bostrom when published, returned in 2022 to check progress and was horrified by it being zero and mesa-optimizers being a thing, and didn't realize the guy who wrote the Sequences was motivated by AIXR until 2023... sigh) then to me, this is how LessWrong seemed: it was almost all about solving technical alignment. The only extreme interventions considered were pivotal acts, compute caps were a new side idea, and the "maybe we just shouldn't build this" posts or calls for public information campaigns were vanishingly small. The latter seemed under-leveraged so I joined Pause AI.

I still don't see them as sensibly leveraged btw. Will someone please fund tens to hundreds of million dollars on a TV and a YouTube public information campaign reaching actual eyeballs?

Jamie Fisher's avatar

I think it's ethically justified to suggest:

Why don't AI Safety Researchers use the Tyler Cowen "Poisoned Strawberry" strategy to wage war on frontier LLM's and slow down the pace of AI research?

(and if companies eventually discover countermeasures, that's a win-win, right??)

Mike's avatar

I'm a little bit baffled by Jack Clark's speech. It feels as if a whole section is missing. If he's afraid, then why keep building it...?

I can imagine many answers to that, but they are not in the speech, which suggests either that it didn't occur to him to justify their work, or he cannot say it out loud, or he cannot justify it.

And yet, it got a positive reaction from the audience, while his extremely mild policy proposals such as "be transparent and talk to people" triggered a meltdown from the esteemed AI Czar.

It sure feels like we're already locked into the bad ending.

gregvp's avatar

I, too, like Janus's Freudian typo.

Take care of yourself and your family, Zvi.

Sergio's avatar

[On discussion of tech right] They will never offer the worried an olive branch. They only serve capital's interest. They will pull out all the stops to advocate against doing the reasonable thing.

This is how it feels opposing capital on any issue. Welcome to the 99%.

Jens B Fiederer's avatar

It looks like you credited Tyler Cowen for Alex Tabarrok's article?