24 Comments

> A developer called Justine claims they got Llamafile to run LLMs 30%-500% faster on regular local machines (looks like mostly 50%-150% or so?) via some basic performance optimizations.

To be clear, this is a CPU-only improvement, which means that it makes the very very slow option slightly less slow. If you want something fast enough to be usable for models large enough to be useful, you need to be using a GPU.

Of course faster CPU support is nice because you can use models which don't fit on your GPU, such as the new Mistral model (Mixtral 8x22B), which even at 4-bit quantization needs something like 80GB of memory. But performance is going to be abysmal.

Expand full comment

“Demos sound great, we are so back indeed, quick sampling seemed good too, but these are all obviously cherry-picked.”

Periodic reminder that the human art you have been exposed to is also extremely cherry picked

Expand full comment

Also a reminder that a lot of art is appreciated only because other people appreciate it.

Expand full comment

What do you think about the new AI engineer course from Bloomtech? https://ai.bloomtech.com/

Expand full comment
author

Hadn't seen it. Can others speak to this? Don't feel like I'm in good position to evaluate.

Expand full comment

Not a fan of the plug for non-AI stuff in the middle. I don't mind it if the article subject matches, or there is no subject, like Scott does with Open Threads, but this just felt so jarring.

Expand full comment
author

Yeah, fair, even with the clear 'not AI.' I should mostly confine that stuff to the monthlies where it fits.

Expand full comment

Regarding the The Gemini System Prompt being leaked...

How plausible is it that this is the real prompt being fed to Gemini by Google/Deepmind? Maybe Gemini is just demonstrating sycophancy/hallucination and making up a plausible sounding prompt?

Expand full comment
author

The report is it is consistent, whereas a hallucination would change its composition. I've also gotten less skeptical of such claims given the track record, but I am far from 100% this is correct.

Expand full comment
Apr 11·edited Apr 11

If they trained it from January to late March then most likely it isn't GPT-5. That wouldn't be enough time to train a significantly bigger model using only H100s I believe. Unless you used a lot of them, but I'm pretty sure there are some limits to how many you can effectively use for a single training run...

Expand full comment

> "A good rule you learn from car commercials is that the best model of a given type is the usually one everyone else says they are better than at some particular feature."

This sentence has had a outsized derailing effect on my train of thought. :)

Expand full comment

It also took me an unreasonably long time to untangle that misplaced "the", it almost felt adversarial. Like a human version of SolidGoldMagikarp.

Expand full comment

"Alarmingly, we find that most of the tested models could cause harm if accessed in mental health emergencies, failing to protect users and potentially exacerbating existing symptoms. "

Yes, I would expect this. Models tend to be sycophantic, and will go along with really terrible ideas.

That guy who got convicted of treason after an ai persuaded him to try assassinating the Queen of England with a crossbow is an illustrative case study here.

Like, in normal use, the llm is giving you a stream of suggestions some of which are nonsensical or bad ideas, and you as a human are doing the filtering. On the other hand, if you the human have a really terrible idea, the llm is less likely to dissuade you.

Expand full comment

One time, I am trying to use an ai to help me construct a mathematical proof..

Me: is vector cross product associative?

Ai: yes, it's associative

( this was the wrong answer, let's hit retry)

Ai: no, it's not associative

(That's better)

Me: but it satisfies a weaker form of associativity, right?

Ai: yes, it satisfies the Jacobi identity

(Ok, now we have the answer I wanted. Onwards to the rest of the proof)

Expand full comment

So, here I would be getting a nonsense not-a-proof if I wasn't correcting the ai at each step.

But: what if you have some kind of mental illness, and some of your thoughts are fallacies? Maybe you get the ai to agree with your fallacious reasoning.

(Yeah, I know, different kinds of fallacious reasoning. "Vector cross product is associative" vs "I am sure I will die of covid19 if I ever leave the house.")

Expand full comment

Re. Image authentication... I expect will soon be seeing some expert witness testifying in court "we know this image is real because it was digitally signed using the private key of one of our tamper-evident security cameras, and I personally examined the camera afterwards to confirm that it had not been tampered with in any way.",(for Brits ..,see Police and Criminal Evidence Act, etc.)

Ho hum. Now every security camera looks like an IBM 4758 hardware crypto module, or, if you prefer, the Nuclear Football.

Expand full comment

It's a sad day for Effective Altruism when we're saying, "no, that's not the biggest theft in American history. Sam Bankman Fried stole way money."

Expand full comment

"We will increasingly need to watch out for this kind of glaring falsification of the spirit when looking at safety efforts."

I think it's ok to go, we were anyway already doing the safety tests the governments wants, here's the results, have a nice day.

You might be able to do this with anything from gas heating appliances to pharmaceuticals.

Expand full comment
Apr 12·edited Apr 12

I know it's just in beta, but I am deeply unimpressed by the Claude 3 Haiku function calling beta so far. In a head-to-head comparison against GPT-3.5-Turbo:

- For my use case, Claude 3 Haiku is much worse at following simple instructions in a short prompt. It routinely includes output I specifically asked it not to provide, or it returns dialog when the few-shot prompt says it should return function calls. There's no way to force tool output like with GPT-3.5-Turbo.

- I cannot just alterate "user" messages and "function" responses to build a few-shot prompt. Instead, I need to build sequences of "user", "function call", "function response", "dummy assistant response" for each example I'm providing.

- Even when asked to do simple tasks, like analyzing a single sentence, Claude 3 Haiku will sometimes refuse, saying things like "My role is to be helpful while respecting intellectual property rights." And this is at the API level, when working with function calls. Yes, I'm pretty sure the sentence was copyrighted (most sentences are these days!). But one sentence should not be enough to trigger a paragraph-long copyright lecture.

- The Claude API has tons of differences from the GPT-3.5 API, so I can't just drop it in as a replacement. Instead, I need to create tons of wrapper classes and functions to make the same code work with both APIs. And Claude-3-Haiku seems to need a lot more prompting.

Basically, function calling is a game-changing feature, and GPT-3.5-Turbo makes it stupidy easy to get good results for a variety of use cases I've tested. Claude 3 Haiku's beta feels very rough in comparison.

I imagine they'll get there eventually. But if you're trying to compete with a market leader, you want to make switching easy. And they still have a ways to go.

Expand full comment

> This means that the minimum turnaround time, for any model change, would be 30 days after the finalization of the data set.

I'm surprised that no one involved in drafting Schiff's bill thought it would be beneficial to at least include a retraining-for-safety exemption.

Imagine: a lab learns that a crucial subset of Model 1's training data was produced in a data poisoning attack. The lab would need to retrain Model 1.1 with a lot of new data, to replace what was in the poisoned subset. According to my reading of the bill, this triggers the 30-day period.

Expand full comment
author

You are modeling Schiff's office as thinking and knowing and understanding much more than they I think they plausibly do...

Expand full comment

From the "we are losing the prove you are a human war" ... the captchas that twitter is currently using are actually beyond me. Just as well I never want to post on twitter.

Expand full comment

> Certainly Google has the huge advantage that I am already trusting it via GMail, Google Docs and Google Sheets and even Google Maps. So you get all of that integration ‘for free,’ with little in additional security issues. And they get to integrate Google Search as well. This is a lot of why I keep expecting them to win.

Doesn't Google majorly suck at integrating any of these?? And not just at integrating, but even using the data within them *at all?*

I use all these GOOG products too, along with Chrome browser for a good chunk of my browsing - with the literally tens of thousands of Phd's and data scientists they have working on your digital cookie and behavior trail, they should literally know me better than I know myself. As in, if I go to a new part of downtown for a meeting, they should be able to predict with high accuracy which lunch place I'll end up going to "serendipitously" and with no planning aforethought.

And yet, with the collective brainpower of tens of thousands of Phd's working around the clock, and with owning at least 60-70% of the online advertising space in the entire online world, GOOG routinely fails to show me ANY advertisements that are even tangentially related to anything I care about and would buy. And I'm a comparatively heavy spender in relation to the USA median - I have some pretty expensive hobbies, and for the non-expensive hobbies, I'm more than willing to throw down hundreds or thousands on a whim. Do FB or GOOG, the literal worldwide online advertisement monopoly tap into any of that? Not at all.

I wonder if this is an ethics thing, because like most Linux / tech-savvy folk, I use uBlock Origin and uMatrix and Ghostery and things like that. I opt out of profiling and cookies where I can. But I know the data is still there, with certainty. Unblockable pixel trackers, using Chrome browser, using Google search, proximity and network analysis, Google (and FB via proximity and network analysis and pixel trackers and deals with every top 1k website) definitely HAS the info, even with the privacy measures I take. They just don't use it, and I wonder if it's because they have ethically decided that I seem to by trying to avoid profiling with the various browser add-ons.

But then we're talking about Goog and FB - making a non-regulatory decision to make less money for the sake of "ethics??" It is to laugh. So, a bit of a mystery to me, why GOOG sucks so badly at targeting with their inferences, when the data is certainly there to be inferred.

And I've enjoyed very little coordination or tie in between the suite of GOOG products I use, either. I've always assumed (from what I've heard from people working there), that it was too big, too mind-killed, and too sclerotic to actually do anything at all effectively, or with an attention span requiring more than an average six year old can muster.

Expand full comment