I lover your summaries. I'm in healthcare not tech so a lot is over my head. I wonder if you would ever consider writing a piece on how the average citizen can use AI to date. Like, if you want to make a cool image use..., or if you're interested in health or medicine check out...or even a list of things to use AI for currently. Thank you for all your work.
I don't personally think this would be an extremely valuable use of Zvi's time (unless he himself wants to do it) – mainly because of the opportunity cost from everything else he's doing.
I don't have a great answer either, nor know anyone else that does off the top of my head, but I do know that all of the 'top chatbots' are all pretty good – goon enough to make cool images and provide interesting answers for questions about health or medicine.
Besides Zvi, I follow quite a few other people interested in or working on AI, but I can't think of any other AI apps or products good for specific uses that are also available to the public – with the exception of some of the image generators. (These surely exist tho.)
But if you haven't, definitely play around with ChatGPT (OpenAI) or Claude (Anthropic) at least.
"Use Claude 3.5 Sonnet for most chatbot things, o1 or ChatGPT websearch or Perplexity sometimes, and Cursor for coding, and you can get basic images from DALL-E just fine" covers the basics of what to use for most purposes that are SFW, but yeah, I doubt I'll be the one writing another basics guide on how to use them.
your discussion of the 99%-bar in academia seems to lend an awful lot of credence to this stranger's off-the-cuff declaration of 99% certainty. i suspect they would not take betting odds at 99% and are significantly less confident than that in reality.
If only there were a machine that could answer questions like this. oh wait …
“Zvi Mowshowitz, a renowned Magic: The Gathering (MTG) player, is celebrated for creating the “Zvi Bargain” deck, which effectively utilized the card Yawgmoth’s Bargain. This card allows players to pay 1 life to draw a card, enabling rapid deck cycling and powerful combinations. Mowshowitz’s innovative use of this card significantly influenced MTG strategies. 
J.D. Vance, the Vice President-elect of the United States, is known for his memoir “Hillbilly Elegy” and his political career. In a 2024 interview, Vance mentioned that Yawgmoth’s Bargain was his favorite MTG card, highlighting his personal connection to the game. 
The connection between Mowshowitz and Vance lies in their shared appreciation for Yawgmoth’s Bargain. Mowshowitz’s strategic mastery of the card in professional play and Vance’s personal fondness for it illustrate the card’s impact on both competitive and casual MTG players.”
I definitely did not have that question in my mental model of something AI would answer, but it makes total sense that AI would be able to answer it in hindsight. Thanks!
To be fair, until quite recently this wouldn’t have worked. This was 4o with search, perplexity also answers correctly, but Claude fails and o1-preview as well because the Vance interview seems to have been after the knowledge cut off.
Obvious statement is obvious: it's now unambiguously to policy advocates' benefit to sign up for Twitter Premium. Is Premium+ worth the extra $7 a month over just Premium?
If Vance actually gets in touch with you, this is evidence that the trump presidency is positive because the other candidate would not have done the same, although I’m also nervous that situational awareness is a terrible idea
Re your question about why would US Gov be slumming with Meta and Palantir, vs OpenAI/Anthropic - as should be no surprise, it's regulations/compliance. There's a lot of compliance hoops to even be able to use a SaaS app for unclass but sensitive data (think PII or HIPAA). To use for national security stuff is notably higher. And not just documentation requirements, but limitations on personnel and more. Palantir has already done this so just an incremental add, whereas it'd really bog down the other frontier models.
The frontier models may get forced there for national security reasons per Sensenbrenner, but even a high profile crash effort is going to take 9-12 months to implement the mundane controls and validate. (I have a cloud SaaS partner who will remain unnamed who has been promising and trying this for >3 years now -- the whole mindset just goes against Silicon Valley ways of thinking and building).
> Finally, note to self, probably still don’t use SQLite if you have a good alternative? Twice is suspicious, although they did fix the bug same day and it wasn’t ever released.
"100% MCDC test coverage. That’s modified condition decision coverage of the code. Your tests have to cause each branch operation in the resulting binary code to be taken and to fall through at least once.
... I had this idea, I’m going to write tests to bring SQLite up to the quality of 100% MCDC, and that took a year of 60 hour weeks. That was hard, hard work. I was putting in 12 hour days every single day. I was just getting so tired of this because with this sort of thing, it’s the old joke of, you get 95% of the functionality with the first 95% of your budget, and the last 5% on the second 95% of your budget. It’s kind of the same thing. It’s pretty easy to get up to 90 or 95% test coverage. Getting that last 5% is really, really hard and it took about a year for me to get there, but ... We just didn’t really have any bugs for the next eight or nine years."
Seconded. "Twice is suspicious" as in "this software has ever had two bugs total" is not a good reason to not use a piece of software. Everyone is looking at sqlite because it's incredibly popular and hardened already and so finding bugs in it is a good demonstration of your tool. Other tools are under less scrutiny, but that does not mean they have fewer bugs; in fact it probably means the opposite.
> Finally, note to self, probably still don’t use SQLite if you have a good alternative? Twice is suspicious, although they did fix the bug same day and it wasn’t ever released.
I don't have data handy, but I strongly endorse the viewpoint that applications that have had security vulnerabilities in the past are more likely to have more vulnerabilities in the future. Partly this is because once a major vulnerability is announced, others come looking, but mostly I think it's just hard to add security as an afterthought.
You might be tempted to think that having a bunch of bad vulnerabilities could make developers "get serious" about security, improving it to a level above other software that hasn't had major breaches. I guess this can happen sometimes — I think Microsoft basically got serious about security after a long string of incredibly embarrassing vulnerabilities in the late 90s and early 00s — but I feel like more often the story is something like Exim (or arguably Android or Chrome), where many layers of security protection are added and yet additional vulnerabilities just keep cropping up anyway.
Also, if you are looking for a SQLite alternative, try duckdb?
I am very skeptical that DuckDB is more secure than SQLite, but also skeptical that security is particularly relevant anyways. I can't think of any good reasons for exposing basically any DB to raw untrusted user input nor any reason why it would ever be likely that a DB would be the _weakest_ link in the security of an application or service.
The only 'real life' DB security vulnerabilities that come to mind are people accidentally exposing them to the Internet.
After scanning the DuckDB GitHub open issues just now, there's a bunch of reported (and currently unfixed) errors, but I'm not sure anyone is seriously looking for exploits anyways (or bothering to report them).
There's certainly reported vulnerabilities for DuckDB elsewhere tho:
To be clear, I am definitely not making the claim that DuckDB is more secure than SQLite, and I agree that embedded databases are rarely the cause of security issues.
> To acknowledge this fact, we’re adding a cost of living adjustment based on the purchasing power parity of each country, capped at a ⅓ discount to our NYC rate. We’re also capping remote positions at 25 hours a week, to be clear that they’re not close to full-time employment. We still pay well–you’re being comped to the most expensive city in the world, after all–but the dream of the future of work being fully remote is over. But that’s okay–it was fun while it lasted!
Counter-point: one huge advantage startups could offer to a talented person over Big Tech is remote work combined with the ability to hire worldwide with no concern for timezones/borders. If you remove that advantage, you lose a huge chunk of the most talented people from even bothering to apply. AI is not yet at a point where it can substitute hiring a super talented person from Minnesota over a mediocre person from NYC who only came to work for you after getting rejected by the 500+ companies who pay more than you do.
Overall I suspect the tweet was written for maximum engagement farming, not as a true reflection of that person's true opinions or operations at their company.
Unrelated to any specific content, but I wanted to chime in that I'm appreciative of these posts becoming more and more readable over time. It's a step change to open an AI post and be able to finish it in one go without any ugh field, versus in the past where it'd take me a few sittings or I gave up and made myself switch to skimming to superficially "finish". Coverage discernment and breaking out more topics into their own posts/into other roundups are paying dividends.
I will [equivalent colloquial metaphor for eating one's hat, as I do not own one] if the major upside of Trump 2.0 is getting the Good End on AI. It'll be an interesting four years for long tails of all sorts. I wonder if anyone's coined "The X-Risk Games" yet.
My favorite part of the Terminal of Truths saga is when the goatse token crashed after TT made a spelling mistake and someone accused it of being a human pretending to be an AI. I never imagined a reverse Turing Test occurring!
Zvi, how can we reach you to connect you with lawmakers? I could not find your email. Please email me at sean@pauseai.info.
I lover your summaries. I'm in healthcare not tech so a lot is over my head. I wonder if you would ever consider writing a piece on how the average citizen can use AI to date. Like, if you want to make a cool image use..., or if you're interested in health or medicine check out...or even a list of things to use AI for currently. Thank you for all your work.
I don't personally think this would be an extremely valuable use of Zvi's time (unless he himself wants to do it) – mainly because of the opportunity cost from everything else he's doing.
I don't have a great answer either, nor know anyone else that does off the top of my head, but I do know that all of the 'top chatbots' are all pretty good – goon enough to make cool images and provide interesting answers for questions about health or medicine.
Besides Zvi, I follow quite a few other people interested in or working on AI, but I can't think of any other AI apps or products good for specific uses that are also available to the public – with the exception of some of the image generators. (These surely exist tho.)
But if you haven't, definitely play around with ChatGPT (OpenAI) or Claude (Anthropic) at least.
"Use Claude 3.5 Sonnet for most chatbot things, o1 or ChatGPT websearch or Perplexity sometimes, and Cursor for coding, and you can get basic images from DALL-E just fine" covers the basics of what to use for most purposes that are SFW, but yeah, I doubt I'll be the one writing another basics guide on how to use them.
Follow Ethan Mollick's substack, he gives the most tactical and "in the trenches" advice.
He does seem like he would be a good choice for this. This seems to be his most recent semiannual guide: https://www.oneusefulthing.org/p/doing-stuff-with-ai-opinionated-midyear
In case you don't seem my nested reply, this guide seems pretty good: https://www.oneusefulthing.org/p/doing-stuff-with-ai-opinionated-midyear
Podcast episode for this post:
https://open.substack.com/pub/dwatvpodcast/p/ai-89-trump-card
your discussion of the 99%-bar in academia seems to lend an awful lot of credence to this stranger's off-the-cuff declaration of 99% certainty. i suspect they would not take betting odds at 99% and are significantly less confident than that in reality.
Ok, we need to hear this Magic/JD Vance story explained
Seconded.
If only there were a machine that could answer questions like this. oh wait …
“Zvi Mowshowitz, a renowned Magic: The Gathering (MTG) player, is celebrated for creating the “Zvi Bargain” deck, which effectively utilized the card Yawgmoth’s Bargain. This card allows players to pay 1 life to draw a card, enabling rapid deck cycling and powerful combinations. Mowshowitz’s innovative use of this card significantly influenced MTG strategies. 
J.D. Vance, the Vice President-elect of the United States, is known for his memoir “Hillbilly Elegy” and his political career. In a 2024 interview, Vance mentioned that Yawgmoth’s Bargain was his favorite MTG card, highlighting his personal connection to the game. 
The connection between Mowshowitz and Vance lies in their shared appreciation for Yawgmoth’s Bargain. Mowshowitz’s strategic mastery of the card in professional play and Vance’s personal fondness for it illustrate the card’s impact on both competitive and casual MTG players.”
I definitely did not have that question in my mental model of something AI would answer, but it makes total sense that AI would be able to answer it in hindsight. Thanks!
To be fair, until quite recently this wouldn’t have worked. This was 4o with search, perplexity also answers correctly, but Claude fails and o1-preview as well because the Vance interview seems to have been after the knowledge cut off.
Obvious statement is obvious: it's now unambiguously to policy advocates' benefit to sign up for Twitter Premium. Is Premium+ worth the extra $7 a month over just Premium?
Either you're actively using Twitter on the posting side or you're not. If you are, even a modest boost is worth $7. If you're not, don't bother.
thanks!
At least one person I respect has had a positive view of Microsoft's policy advocacy, but I've always been dubious; see https://ailabwatch.org/resources/company-advocacy/#microsoft
If Vance actually gets in touch with you, this is evidence that the trump presidency is positive because the other candidate would not have done the same, although I’m also nervous that situational awareness is a terrible idea
Re your question about why would US Gov be slumming with Meta and Palantir, vs OpenAI/Anthropic - as should be no surprise, it's regulations/compliance. There's a lot of compliance hoops to even be able to use a SaaS app for unclass but sensitive data (think PII or HIPAA). To use for national security stuff is notably higher. And not just documentation requirements, but limitations on personnel and more. Palantir has already done this so just an incremental add, whereas it'd really bog down the other frontier models.
The frontier models may get forced there for national security reasons per Sensenbrenner, but even a high profile crash effort is going to take 9-12 months to implement the mundane controls and validate. (I have a cloud SaaS partner who will remain unnamed who has been promising and trying this for >3 years now -- the whole mindset just goes against Silicon Valley ways of thinking and building).
> Finally, note to self, probably still don’t use SQLite if you have a good alternative? Twice is suspicious, although they did fix the bug same day and it wasn’t ever released.
Evidence that SQLite is among the highest-quality software artifacts we have: https://www.sqlite.org/testing.html
"100% MCDC test coverage. That’s modified condition decision coverage of the code. Your tests have to cause each branch operation in the resulting binary code to be taken and to fall through at least once.
... I had this idea, I’m going to write tests to bring SQLite up to the quality of 100% MCDC, and that took a year of 60 hour weeks. That was hard, hard work. I was putting in 12 hour days every single day. I was just getting so tired of this because with this sort of thing, it’s the old joke of, you get 95% of the functionality with the first 95% of your budget, and the last 5% on the second 95% of your budget. It’s kind of the same thing. It’s pretty easy to get up to 90 or 95% test coverage. Getting that last 5% is really, really hard and it took about a year for me to get there, but ... We just didn’t really have any bugs for the next eight or nine years."
https://corecursive.com/066-sqlite-with-richard-hipp/
Seconded. "Twice is suspicious" as in "this software has ever had two bugs total" is not a good reason to not use a piece of software. Everyone is looking at sqlite because it's incredibly popular and hardened already and so finding bugs in it is a good demonstration of your tool. Other tools are under less scrutiny, but that does not mean they have fewer bugs; in fact it probably means the opposite.
> Finally, note to self, probably still don’t use SQLite if you have a good alternative? Twice is suspicious, although they did fix the bug same day and it wasn’t ever released.
I don't have data handy, but I strongly endorse the viewpoint that applications that have had security vulnerabilities in the past are more likely to have more vulnerabilities in the future. Partly this is because once a major vulnerability is announced, others come looking, but mostly I think it's just hard to add security as an afterthought.
You might be tempted to think that having a bunch of bad vulnerabilities could make developers "get serious" about security, improving it to a level above other software that hasn't had major breaches. I guess this can happen sometimes — I think Microsoft basically got serious about security after a long string of incredibly embarrassing vulnerabilities in the late 90s and early 00s — but I feel like more often the story is something like Exim (or arguably Android or Chrome), where many layers of security protection are added and yet additional vulnerabilities just keep cropping up anyway.
Also, if you are looking for a SQLite alternative, try duckdb?
I am very skeptical that DuckDB is more secure than SQLite, but also skeptical that security is particularly relevant anyways. I can't think of any good reasons for exposing basically any DB to raw untrusted user input nor any reason why it would ever be likely that a DB would be the _weakest_ link in the security of an application or service.
The only 'real life' DB security vulnerabilities that come to mind are people accidentally exposing them to the Internet.
After scanning the DuckDB GitHub open issues just now, there's a bunch of reported (and currently unfixed) errors, but I'm not sure anyone is seriously looking for exploits anyways (or bothering to report them).
There's certainly reported vulnerabilities for DuckDB elsewhere tho:
- https://nvd.nist.gov/vuln/detail/CVE-2024-41672
- https://security.snyk.io/vuln/SNYK-UNMANAGED-DUCKDBDUCKDB-6209702
I was wrong – there are some security vulnerabilities reported under their GitHub project:
- https://github.com/duckdb/duckdb/security/advisories/GHSA-w2gf-jxc9-pf2q
I think it's probably a fine DB, but SQLite still feels to me like the far more 'boring' choice to make, even if security was my primary criteria.
To be clear, I am definitely not making the claim that DuckDB is more secure than SQLite, and I agree that embedded databases are rarely the cause of security issues.
Ahhh – the adjacency was confusing to me.
You should definitely consider using SQLite. Linus's Law isn't literally true, but it's most true for projects like SQLite.
The latest bug is very underwhelming: https://project-zero.issues.chromium.org/issues/372435124
(It's very cool that it was discovered, it's just a nothing-burger of a bug.)
I suspect no other DB is as secure.
(It's not the best choice for some cases for lots of other reasonable reasons tho, but security is NOT one of them AFAICT/AFAIK.)
> To acknowledge this fact, we’re adding a cost of living adjustment based on the purchasing power parity of each country, capped at a ⅓ discount to our NYC rate. We’re also capping remote positions at 25 hours a week, to be clear that they’re not close to full-time employment. We still pay well–you’re being comped to the most expensive city in the world, after all–but the dream of the future of work being fully remote is over. But that’s okay–it was fun while it lasted!
Counter-point: one huge advantage startups could offer to a talented person over Big Tech is remote work combined with the ability to hire worldwide with no concern for timezones/borders. If you remove that advantage, you lose a huge chunk of the most talented people from even bothering to apply. AI is not yet at a point where it can substitute hiring a super talented person from Minnesota over a mediocre person from NYC who only came to work for you after getting rejected by the 500+ companies who pay more than you do.
Overall I suspect the tweet was written for maximum engagement farming, not as a true reflection of that person's true opinions or operations at their company.
Unrelated to any specific content, but I wanted to chime in that I'm appreciative of these posts becoming more and more readable over time. It's a step change to open an AI post and be able to finish it in one go without any ugh field, versus in the past where it'd take me a few sittings or I gave up and made myself switch to skimming to superficially "finish". Coverage discernment and breaking out more topics into their own posts/into other roundups are paying dividends.
I will [equivalent colloquial metaphor for eating one's hat, as I do not own one] if the major upside of Trump 2.0 is getting the Good End on AI. It'll be an interesting four years for long tails of all sorts. I wonder if anyone's coined "The X-Risk Games" yet.
My favorite part of the Terminal of Truths saga is when the goatse token crashed after TT made a spelling mistake and someone accused it of being a human pretending to be an AI. I never imagined a reverse Turing Test occurring!
Scott Alexander, of course, did imagine it: https://www.astralcodexten.com/p/turing-test?utm_campaign=post&utm_medium=web
> if you’re paying at all - and you should be!
Is that true? I tend to think that paying for these services speeds up AI timelines.
Surely not very much, given that the big labs are fuelled by venture capital over sales income, to some very high proportion…?
As in, it's a huge boost in your productivity. There's no way this trade-off isn't worth it.