
AI Canon
Comments
Click back a couple years and you'll find this page: https://news.ycombinator.com/from?site=a16z.com&next=2981684... with submissions like "DAOs, a Canon" https://news.ycombinator.com/item?id=29440901
Probably in the blundering sense of "exponential", meaning a lot. But what are some specific numbers? (such as publications)
Now that I'm spending time learning AI, it feels the same -- but the innovation pace feels at least 10x faster than the evolution of the cloud native ecosystem.
At this point, there's a reasonable degree of convergence around the core abstractions you should start with in the cloud-native world, and an article written today on this would probably be fine a year from now. I doubt this is the case in AI.
(Caveat: I've only been learning about the space for about 4 weeks, so maybe it's just me!)
https://a16z.com/2023/01/19/who-owns-the-generative-ai-platf...
Over the last year, we’ve met with dozens of startup founders and operators in large companies who deal directly with generative AI. We’ve observed that infrastructure vendors are likely the biggest winners in this market so far, capturing the majority of dollars flowing through the stack. Application companies are growing topline revenues very quickly but often struggle with retention, product differentiation, and gross margins. And most model providers, though responsible for the very existence of this market, haven’t yet achieved large commercial scale.
In other words, the companies creating the most value — i.e. training generative AI models and applying them in new apps — haven’t captured most of it
What's the last investment A16Z was actually ahead of the curve on? I guess it isn't important, since from their position, they don't rely on being ahead of the curve in order to make good investments, they make their investments good through their network and funding abilities.
CNBC: https://www.cnbc.com/2022/08/15/a16z-to-invest-in-adam-neuma...
Probably also canonical are Goodfellow's Deep Learning [2], Koller & Friedman's PGMs [3], the Krizhevsky ImageNet paper [4], the original GAN [5], and arguably also the AlphaGo paper [6] and the Atari DQN paper [7].
[1] https://aima.cs.berkeley.edu/
[2] https://www.deeplearningbook.org/
[3] https://www.amazon.com/Probabilistic-Graphical-Models-Princi...
[4] https://proceedings.neurips.cc/paper_files/paper/2012/file/c...
[5] https://arxiv.org/abs/1406.2661
Geoff Hilton had been saying this well before 2017. I remember his talks at Google ~2013ish.
1. quality of input data - for language models that are currently setup to be force-fed with any incoming data instead of real training (see 2.) this is the greatest gain you can get for your money - models can't distinguish between truth and nonsense, they're forced to follow training data auto-completion regardless of how stupid or sane it is
2. evaluation of input data by the model itself - self evaluating what is nonsense during training and what makes sense/is worthy of learning - based on so far gathered knowledge, dealing with biases in this area etc.
Current training methods equate things like first order logic with any kind of nonsense - having on its defense only quantity, not quality.
But there are many widely repeated things that are plainly wrong. Simplifying this thought - if there weren't, there would be no further progress in human kind. We constantly reexamine assumptions and come up with new theories leaving solid axioms untouched - why not teach this approach/hardcode it into LLMs?
Those two aspects seem to be problems with large gains, yet nobody seems to be discussing them.
Align training towards common/self sense, good/own judgement, not unconditional alignment towards input data.
If fine-tuning works, why not start training with first principles - dictionary, logic, base theories like sets, categories, encyclopedia of facts (omitting historic facts which are irrelevant at this stage) etc. - taking snapshots at each stage so others can fork their own training trees. Maybe even stop calling fine-tuning fine-tuning, just learning stages. Let researchers play with paths on those trees and evaluate them to find something more optimal, find optimal network sizes for each step, allow models to gradually grow in size etc.
To rephrase it a bit - we're saying that base models learned on large data work well when fine tuned - why not base models trained on first principles can continue to be trained on concepts that depend on previously learned first principles recursively efficiently - did anybody try?
Like everyone else, starting about a year and a half ago I have found it really difficult to stay up to date.
I try to dive deep on a narrow topic for several months and then move on.
I am just wrapping up a dive into GPT+LangChain+LlamaIndex applications. I am now preparing to drop most follows on social media for GPT+LangChain+LlamaIndex and try to find good people and companies to follow for LLM+Knowledge Graphs (something I tried 3 years ago, but the field was too new).
I find that when I want to dive into something new the best starting point is finding the right people who post links to the best new papers, etc.