> A developer called Justine claims they got Llamafile to run LLMs 30%-500% faster on regular local machines (looks like mostly 50%-150% or so?) via some basic performance optimizations.
To be clear, this is a CPU-only improvement, which means that it makes the very very slow option slightly less slow. If you want something fast enough to be usable for models large enough to be useful, you need to be using a GPU.
Of course faster CPU support is nice because you can use models which don't fit on your GPU, such as the new Mistral model (Mixtral 8x22B), which even at 4-bit quantization needs something like 80GB of memory. But performance is going to be abysmal.
Not a fan of the plug for non-AI stuff in the middle. I don't mind it if the article subject matches, or there is no subject, like Scott does with Open Threads, but this just felt so jarring.
Regarding the The Gemini System Prompt being leaked...
How plausible is it that this is the real prompt being fed to Gemini by Google/Deepmind? Maybe Gemini is just demonstrating sycophancy/hallucination and making up a plausible sounding prompt?
The report is it is consistent, whereas a hallucination would change its composition. I've also gotten less skeptical of such claims given the track record, but I am far from 100% this is correct.
If they trained it from January to late March then most likely it isn't GPT-5. That wouldn't be enough time to train a significantly bigger model using only H100s I believe. Unless you used a lot of them, but I'm pretty sure there are some limits to how many you can effectively use for a single training run...
> "A good rule you learn from car commercials is that the best model of a given type is the usually one everyone else says they are better than at some particular feature."
This sentence has had a outsized derailing effect on my train of thought. :)
"Alarmingly, we find that most of the tested models could cause harm if accessed in mental health emergencies, failing to protect users and potentially exacerbating existing symptoms. "
Yes, I would expect this. Models tend to be sycophantic, and will go along with really terrible ideas.
That guy who got convicted of treason after an ai persuaded him to try assassinating the Queen of England with a crossbow is an illustrative case study here.
Like, in normal use, the llm is giving you a stream of suggestions some of which are nonsensical or bad ideas, and you as a human are doing the filtering. On the other hand, if you the human have a really terrible idea, the llm is less likely to dissuade you.
So, here I would be getting a nonsense not-a-proof if I wasn't correcting the ai at each step.
But: what if you have some kind of mental illness, and some of your thoughts are fallacies? Maybe you get the ai to agree with your fallacious reasoning.
(Yeah, I know, different kinds of fallacious reasoning. "Vector cross product is associative" vs "I am sure I will die of covid19 if I ever leave the house.")
Re. Image authentication... I expect will soon be seeing some expert witness testifying in court "we know this image is real because it was digitally signed using the private key of one of our tamper-evident security cameras, and I personally examined the camera afterwards to confirm that it had not been tampered with in any way.",(for Brits ..,see Police and Criminal Evidence Act, etc.)
Ho hum. Now every security camera looks like an IBM 4758 hardware crypto module, or, if you prefer, the Nuclear Football.
I know it's just in beta, but I am deeply unimpressed by the Claude 3 Haiku function calling beta so far. In a head-to-head comparison against GPT-3.5-Turbo:
- For my use case, Claude 3 Haiku is much worse at following simple instructions in a short prompt. It routinely includes output I specifically asked it not to provide, or it returns dialog when the few-shot prompt says it should return function calls. There's no way to force tool output like with GPT-3.5-Turbo.
- I cannot just alterate "user" messages and "function" responses to build a few-shot prompt. Instead, I need to build sequences of "user", "function call", "function response", "dummy assistant response" for each example I'm providing.
- Even when asked to do simple tasks, like analyzing a single sentence, Claude 3 Haiku will sometimes refuse, saying things like "My role is to be helpful while respecting intellectual property rights." And this is at the API level, when working with function calls. Yes, I'm pretty sure the sentence was copyrighted (most sentences are these days!). But one sentence should not be enough to trigger a paragraph-long copyright lecture.
- The Claude API has tons of differences from the GPT-3.5 API, so I can't just drop it in as a replacement. Instead, I need to create tons of wrapper classes and functions to make the same code work with both APIs. And Claude-3-Haiku seems to need a lot more prompting.
Basically, function calling is a game-changing feature, and GPT-3.5-Turbo makes it stupidy easy to get good results for a variety of use cases I've tested. Claude 3 Haiku's beta feels very rough in comparison.
I imagine they'll get there eventually. But if you're trying to compete with a market leader, you want to make switching easy. And they still have a ways to go.
> This means that the minimum turnaround time, for any model change, would be 30 days after the finalization of the data set.
I'm surprised that no one involved in drafting Schiff's bill thought it would be beneficial to at least include a retraining-for-safety exemption.
Imagine: a lab learns that a crucial subset of Model 1's training data was produced in a data poisoning attack. The lab would need to retrain Model 1.1 with a lot of new data, to replace what was in the poisoned subset. According to my reading of the bill, this triggers the 30-day period.
From the "we are losing the prove you are a human war" ... the captchas that twitter is currently using are actually beyond me. Just as well I never want to post on twitter.
> Certainly Google has the huge advantage that I am already trusting it via GMail, Google Docs and Google Sheets and even Google Maps. So you get all of that integration ‘for free,’ with little in additional security issues. And they get to integrate Google Search as well. This is a lot of why I keep expecting them to win.
Doesn't Google majorly suck at integrating any of these?? And not just at integrating, but even using the data within them *at all?*
I use all these GOOG products too, along with Chrome browser for a good chunk of my browsing - with the literally tens of thousands of Phd's and data scientists they have working on your digital cookie and behavior trail, they should literally know me better than I know myself. As in, if I go to a new part of downtown for a meeting, they should be able to predict with high accuracy which lunch place I'll end up going to "serendipitously" and with no planning aforethought.
And yet, with the collective brainpower of tens of thousands of Phd's working around the clock, and with owning at least 60-70% of the online advertising space in the entire online world, GOOG routinely fails to show me ANY advertisements that are even tangentially related to anything I care about and would buy. And I'm a comparatively heavy spender in relation to the USA median - I have some pretty expensive hobbies, and for the non-expensive hobbies, I'm more than willing to throw down hundreds or thousands on a whim. Do FB or GOOG, the literal worldwide online advertisement monopoly tap into any of that? Not at all.
I wonder if this is an ethics thing, because like most Linux / tech-savvy folk, I use uBlock Origin and uMatrix and Ghostery and things like that. I opt out of profiling and cookies where I can. But I know the data is still there, with certainty. Unblockable pixel trackers, using Chrome browser, using Google search, proximity and network analysis, Google (and FB via proximity and network analysis and pixel trackers and deals with every top 1k website) definitely HAS the info, even with the privacy measures I take. They just don't use it, and I wonder if it's because they have ethically decided that I seem to by trying to avoid profiling with the various browser add-ons.
But then we're talking about Goog and FB - making a non-regulatory decision to make less money for the sake of "ethics??" It is to laugh. So, a bit of a mystery to me, why GOOG sucks so badly at targeting with their inferences, when the data is certainly there to be inferred.
And I've enjoyed very little coordination or tie in between the suite of GOOG products I use, either. I've always assumed (from what I've heard from people working there), that it was too big, too mind-killed, and too sclerotic to actually do anything at all effectively, or with an attention span requiring more than an average six year old can muster.
> A developer called Justine claims they got Llamafile to run LLMs 30%-500% faster on regular local machines (looks like mostly 50%-150% or so?) via some basic performance optimizations.
To be clear, this is a CPU-only improvement, which means that it makes the very very slow option slightly less slow. If you want something fast enough to be usable for models large enough to be useful, you need to be using a GPU.
Of course faster CPU support is nice because you can use models which don't fit on your GPU, such as the new Mistral model (Mixtral 8x22B), which even at 4-bit quantization needs something like 80GB of memory. But performance is going to be abysmal.
“Demos sound great, we are so back indeed, quick sampling seemed good too, but these are all obviously cherry-picked.”
Periodic reminder that the human art you have been exposed to is also extremely cherry picked
Also a reminder that a lot of art is appreciated only because other people appreciate it.
What do you think about the new AI engineer course from Bloomtech? https://ai.bloomtech.com/
Hadn't seen it. Can others speak to this? Don't feel like I'm in good position to evaluate.
Not a fan of the plug for non-AI stuff in the middle. I don't mind it if the article subject matches, or there is no subject, like Scott does with Open Threads, but this just felt so jarring.
Yeah, fair, even with the clear 'not AI.' I should mostly confine that stuff to the monthlies where it fits.
Regarding the The Gemini System Prompt being leaked...
How plausible is it that this is the real prompt being fed to Gemini by Google/Deepmind? Maybe Gemini is just demonstrating sycophancy/hallucination and making up a plausible sounding prompt?
The report is it is consistent, whereas a hallucination would change its composition. I've also gotten less skeptical of such claims given the track record, but I am far from 100% this is correct.
If they trained it from January to late March then most likely it isn't GPT-5. That wouldn't be enough time to train a significantly bigger model using only H100s I believe. Unless you used a lot of them, but I'm pretty sure there are some limits to how many you can effectively use for a single training run...
> "A good rule you learn from car commercials is that the best model of a given type is the usually one everyone else says they are better than at some particular feature."
This sentence has had a outsized derailing effect on my train of thought. :)
It also took me an unreasonably long time to untangle that misplaced "the", it almost felt adversarial. Like a human version of SolidGoldMagikarp.
"Alarmingly, we find that most of the tested models could cause harm if accessed in mental health emergencies, failing to protect users and potentially exacerbating existing symptoms. "
Yes, I would expect this. Models tend to be sycophantic, and will go along with really terrible ideas.
That guy who got convicted of treason after an ai persuaded him to try assassinating the Queen of England with a crossbow is an illustrative case study here.
Like, in normal use, the llm is giving you a stream of suggestions some of which are nonsensical or bad ideas, and you as a human are doing the filtering. On the other hand, if you the human have a really terrible idea, the llm is less likely to dissuade you.
One time, I am trying to use an ai to help me construct a mathematical proof..
Me: is vector cross product associative?
Ai: yes, it's associative
( this was the wrong answer, let's hit retry)
Ai: no, it's not associative
(That's better)
Me: but it satisfies a weaker form of associativity, right?
Ai: yes, it satisfies the Jacobi identity
(Ok, now we have the answer I wanted. Onwards to the rest of the proof)
So, here I would be getting a nonsense not-a-proof if I wasn't correcting the ai at each step.
But: what if you have some kind of mental illness, and some of your thoughts are fallacies? Maybe you get the ai to agree with your fallacious reasoning.
(Yeah, I know, different kinds of fallacious reasoning. "Vector cross product is associative" vs "I am sure I will die of covid19 if I ever leave the house.")
Re. Image authentication... I expect will soon be seeing some expert witness testifying in court "we know this image is real because it was digitally signed using the private key of one of our tamper-evident security cameras, and I personally examined the camera afterwards to confirm that it had not been tampered with in any way.",(for Brits ..,see Police and Criminal Evidence Act, etc.)
Ho hum. Now every security camera looks like an IBM 4758 hardware crypto module, or, if you prefer, the Nuclear Football.
It's a sad day for Effective Altruism when we're saying, "no, that's not the biggest theft in American history. Sam Bankman Fried stole way money."
"We will increasingly need to watch out for this kind of glaring falsification of the spirit when looking at safety efforts."
I think it's ok to go, we were anyway already doing the safety tests the governments wants, here's the results, have a nice day.
You might be able to do this with anything from gas heating appliances to pharmaceuticals.
I know it's just in beta, but I am deeply unimpressed by the Claude 3 Haiku function calling beta so far. In a head-to-head comparison against GPT-3.5-Turbo:
- For my use case, Claude 3 Haiku is much worse at following simple instructions in a short prompt. It routinely includes output I specifically asked it not to provide, or it returns dialog when the few-shot prompt says it should return function calls. There's no way to force tool output like with GPT-3.5-Turbo.
- I cannot just alterate "user" messages and "function" responses to build a few-shot prompt. Instead, I need to build sequences of "user", "function call", "function response", "dummy assistant response" for each example I'm providing.
- Even when asked to do simple tasks, like analyzing a single sentence, Claude 3 Haiku will sometimes refuse, saying things like "My role is to be helpful while respecting intellectual property rights." And this is at the API level, when working with function calls. Yes, I'm pretty sure the sentence was copyrighted (most sentences are these days!). But one sentence should not be enough to trigger a paragraph-long copyright lecture.
- The Claude API has tons of differences from the GPT-3.5 API, so I can't just drop it in as a replacement. Instead, I need to create tons of wrapper classes and functions to make the same code work with both APIs. And Claude-3-Haiku seems to need a lot more prompting.
Basically, function calling is a game-changing feature, and GPT-3.5-Turbo makes it stupidy easy to get good results for a variety of use cases I've tested. Claude 3 Haiku's beta feels very rough in comparison.
I imagine they'll get there eventually. But if you're trying to compete with a market leader, you want to make switching easy. And they still have a ways to go.
> This means that the minimum turnaround time, for any model change, would be 30 days after the finalization of the data set.
I'm surprised that no one involved in drafting Schiff's bill thought it would be beneficial to at least include a retraining-for-safety exemption.
Imagine: a lab learns that a crucial subset of Model 1's training data was produced in a data poisoning attack. The lab would need to retrain Model 1.1 with a lot of new data, to replace what was in the poisoned subset. According to my reading of the bill, this triggers the 30-day period.
You are modeling Schiff's office as thinking and knowing and understanding much more than they I think they plausibly do...
From the "we are losing the prove you are a human war" ... the captchas that twitter is currently using are actually beyond me. Just as well I never want to post on twitter.
> Certainly Google has the huge advantage that I am already trusting it via GMail, Google Docs and Google Sheets and even Google Maps. So you get all of that integration ‘for free,’ with little in additional security issues. And they get to integrate Google Search as well. This is a lot of why I keep expecting them to win.
Doesn't Google majorly suck at integrating any of these?? And not just at integrating, but even using the data within them *at all?*
I use all these GOOG products too, along with Chrome browser for a good chunk of my browsing - with the literally tens of thousands of Phd's and data scientists they have working on your digital cookie and behavior trail, they should literally know me better than I know myself. As in, if I go to a new part of downtown for a meeting, they should be able to predict with high accuracy which lunch place I'll end up going to "serendipitously" and with no planning aforethought.
And yet, with the collective brainpower of tens of thousands of Phd's working around the clock, and with owning at least 60-70% of the online advertising space in the entire online world, GOOG routinely fails to show me ANY advertisements that are even tangentially related to anything I care about and would buy. And I'm a comparatively heavy spender in relation to the USA median - I have some pretty expensive hobbies, and for the non-expensive hobbies, I'm more than willing to throw down hundreds or thousands on a whim. Do FB or GOOG, the literal worldwide online advertisement monopoly tap into any of that? Not at all.
I wonder if this is an ethics thing, because like most Linux / tech-savvy folk, I use uBlock Origin and uMatrix and Ghostery and things like that. I opt out of profiling and cookies where I can. But I know the data is still there, with certainty. Unblockable pixel trackers, using Chrome browser, using Google search, proximity and network analysis, Google (and FB via proximity and network analysis and pixel trackers and deals with every top 1k website) definitely HAS the info, even with the privacy measures I take. They just don't use it, and I wonder if it's because they have ethically decided that I seem to by trying to avoid profiling with the various browser add-ons.
But then we're talking about Goog and FB - making a non-regulatory decision to make less money for the sake of "ethics??" It is to laugh. So, a bit of a mystery to me, why GOOG sucks so badly at targeting with their inferences, when the data is certainly there to be inferred.
And I've enjoyed very little coordination or tie in between the suite of GOOG products I use, either. I've always assumed (from what I've heard from people working there), that it was too big, too mind-killed, and too sclerotic to actually do anything at all effectively, or with an attention span requiring more than an average six year old can muster.