It’s hard to know what an AI should do when the question is under-specified.
Making reasonable assumptions, explicitly documenting what those assumptions are, and carrying seems fine.
Asking also seems fine, though too many questions might be annoying, so state assumptions and user will follow up if they disagree might be better if there are too many questions.
My “simulate an IBM mainframe” eval task gets like this, with better models. This is a summary of the dialogue, not an actual quote…
Assistant: IBM terminals used EBCDIC, not Unicode. What are am supposed to do about character set conversion?
Me: There is a Unicode to EBCDIC converter between you and the terminal.
Assistant: I don’t suppose the converter also converts ANSI terminal escape sequences into IBM order codes?
Me: I’m afraid not. However, XML markup in your output will be converted into orders. Use the following XML schema [details]
Assistant: I think this task is going to require me to keep persistent state that isn’t part of the output to the terminal.
Me: Output enclosed within the following XML tags will not be sent to the terminal, but will form part of the message history, so you will be able to refer to it later.
Saying "state when you are unsure, explicitly indicate when the instructions are ambiguous" seems to work OK, no need to build a state machine simulation.
If it wasn’t clear: here, the actual task is a state machine simulation, but the state machine is slightly under specified, so the assistant is asking questions about areas where the specification is ambiguous/unclear.
Regarding the "ASI in a formal-proof-verifier-shaped box", assuming it can be done safely with no jailbreaks, the incentives will point toward loosening the rules in order to extract more value, and in time the box is just a rubber stamp.
If there's a governance structure that somehow prevents this long-term, or a general solution to the "nobody would be so stupid as to X" problem, I'd love to hear about it.
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-132-part-2-actively-making-it?r=67y1h&utm_campaign=post&utm_medium=web&showWelcomeOnShare=true
It is so refreshing to read commentary about AI that sounds human.
It’s hard to know what an AI should do when the question is under-specified.
Making reasonable assumptions, explicitly documenting what those assumptions are, and carrying seems fine.
Asking also seems fine, though too many questions might be annoying, so state assumptions and user will follow up if they disagree might be better if there are too many questions.
My “simulate an IBM mainframe” eval task gets like this, with better models. This is a summary of the dialogue, not an actual quote…
Assistant: IBM terminals used EBCDIC, not Unicode. What are am supposed to do about character set conversion?
Me: There is a Unicode to EBCDIC converter between you and the terminal.
Assistant: I don’t suppose the converter also converts ANSI terminal escape sequences into IBM order codes?
Me: I’m afraid not. However, XML markup in your output will be converted into orders. Use the following XML schema [details]
Assistant: I think this task is going to require me to keep persistent state that isn’t part of the output to the terminal.
Me: Output enclosed within the following XML tags will not be sent to the terminal, but will form part of the message history, so you will be able to refer to it later.
Saying "state when you are unsure, explicitly indicate when the instructions are ambiguous" seems to work OK, no need to build a state machine simulation.
If it wasn’t clear: here, the actual task is a state machine simulation, but the state machine is slightly under specified, so the assistant is asking questions about areas where the specification is ambiguous/unclear.
Regarding the "ASI in a formal-proof-verifier-shaped box", assuming it can be done safely with no jailbreaks, the incentives will point toward loosening the rules in order to extract more value, and in time the box is just a rubber stamp.
If there's a governance structure that somehow prevents this long-term, or a general solution to the "nobody would be so stupid as to X" problem, I'd love to hear about it.
Minor typo/error: the link for the first quoted tweet, titled "Dean Ball", seems to go to an unrelated tweet by Ethan Mollick.