Peter Wang on How to Democratise AI

See also Ursula Martin's talk in Emily Riehl - Formalizing invisible mathematics: case studies from higher category theory.

I'm too old for this shit! At 7:12 there's a sign of what's really going on in LLMs. See Florentin Guth and Brice Ménard's paper On the universality of neural encodings in CNNs and Rishi Jha, Collin Zhang, Vitaly Shmatikov and John X. Morris's Harnessing the Universal Geometry of Embeddings: We introduce the first method for translating text embeddings from one vector space to another without any paired data, encoders, or predefined sets of matches. Our unsupervised approach translates any embedding to and from a universal latent representation (i.e., a universal semantic structure conjectured by the Platonic Representation Hypothesis). Our translations achieve high cosine similarity across model pairs with different architectures, parameter counts, and training datasets.
The ability to translate unknown embeddings into a different space while preserving their geometry has serious implications for the security of vector databases. An adversary with access only to embedding vectors can extract sensitive information about the underlying documents, sufficient for classification and attribute inference.

It's producing an optimum encoding, the one Claude Shannon described in 1948 in sections I.2 and I.3 of A Mathematical Theory of Communication or it would of there was some proper control of the input. If you read E. T. Jaynes book Probability Theory you see that this is a very interesting question from the point of view of Bayesian probability. This quote on thinking machines from Jaynes' book sums it up:

Models have practical uses of a quite different type. Many people are fond of saying, “They will never make a machine to replace the human mind—it does many things which no machine could ever do.” A beautiful answer to this was given by J. von Neumann in a talk on computers given in Princeton in 1948, which the writer was privileged to attend. In reply to the canonical question from the audience [“But of course, a mere machine can’t really think, can it?”], he said: “You insist that there is something a machine cannot do. If you will tell me precisely what it is that a machine cannot do, then I can always make a machine which will do just that!”

In Jaynes' 1978 essay Where do we stand on Maximum Entropy? he wrote, about Shannon's notion of entropy as information:

In a communication process, the message Mi is assigned probability pi, and the entropy H = - ∑ pi log pi is a measure of "information." But whose information? It seems at first that if information is being  "sent," it must be possessed by the sender. But the sender knows perfectly well which message he wants to send; what could it possibly mean to speak of the probability that he will send message Mi?

We take a step in the direction of making sense out of this if we suppose that H measures, not the information of the sender, but the ignorance of the receiver, that is removed by receipt of the message. Indeed, many subsequent commentators appear to adopt this interpretation. Shannon, however, proceeds to use H to determine the channel capacity C required to transmit the message at the given rate. But whether a channel can or cannot transmit a message M in time T obviously depends only upon properties of the message and the channel --- and not at all on the prior ignorance of the receiver! So this interpretation will not work either.

Agonizing over this, I was driven to conclude that the different messages considered must be the set of all those that will, or might be, sent over the channel during its useful life; and therefore Shannon's H measures the degree of ignorance of the communications engineer when he designs the technical equipment in the channel. Such a viewpoint would, to say the least, seem natural to an engineer employed at Bell Telephone Laboratories -- yet it is curious that nowhere does Shannon see fit to tell the reader explicitly whose state of knowledge he is considering, although the whole content of the theory depends crucially on this.

From E.T. Jaynes "Where do we stand on Maximum Entropy?" (1978) [p 23] 

Subscribe to FUTO

Comments

Popular posts from this blog

Steven Johnson - So You Think You Know How to Take Derivatives?

Welsh Republic Podcast Talking With Kars Collective on Armenia Azerbaijan Conflict

Hitachi HD44780U LCD Display Fonts