I personally think that general consumers will never use LLMs in any significant number. I think that LLMs will exist in two distinct spaces, FOSS for devs and other technical people who want to run there own infra locally - and B2B for everything else.
The few big AI companies that manage to last will be selling access to their models for much higher prices. Probably similar to current proprietary commercial software like VMWare, SolidWorks, VEEAM, Splunk, etc. Companies will pay hundreds, possibly thousands of dollars per seat depending on the niche offering and amount of usage.
Suppose that a company developed an LLM that is trained & tuned specifically to do legal work, and suppose it produced work that was around 95% the quality of a typical paralegal. If that company charged $6,000 a year per license to work on their platform, that’s expensive, but if you’re a small firm with say, a dozen full time lawyers, then for the yearly price of a single average paralegal, you could have each lawyer using that software to do most of the work that the paralegal would have done. I can see those kinds of applications happening more and more.
This assumes though that LLMs will continue to improve at a significant rate for a long time into the future, (5-10 more years) which isn’t at all obvious, and there is some evidence that it’s already starting to hit a ceiling.
There are other ways it might work, like if there is a method of compression that is discovered that reduces the necessary RAM and Compute needs by 2-3 orders of magnitude. So models that are considered very large today (100-300 billion params at full quality) might be able to run effectively on a single 32GB GPU that costs a few thousand dollars.
So the cost to run these models is reduced immensely, and a single small data center could run enormous models with 1,000,000+ context windows for tens of thousands of users at once.
But that cuts both ways, which is something that any AI company is going to have to deal with. Once small free models get good enough to do the vast majority of a task, a user is going to start weighing the cost/benefits, and the prospect of just buying a box and throwing one of these models in for a few grand will be very appealing.
I think there may be a good market out there for “AI boxes”, compact computers designed to run a tuned LLM, set up with a little special sauce so the interface is user-friendly, etc. Companies could sell these with support contracts to legal firms, indie Dev studios, startups, small government agencies, etc.
Idk, it’s so up in the air right now, and everything is constantly changing so fast. It’s impossible to predict where things will be in 6 months, let alone 6 years from now.
There are other ways it might work, like if there is a method of compression that is discovered that reduces the necessary RAM and Compute needs by 2-3 orders of magnitude. So models that are considered very large today (100-300 billion params at full quality) might be able to run effectively on a single 32GB GPU that costs a few thousand dollars.
You might want to check in on how well distilled / quantized models are doing, compared to gigundo datacenter versions.
I personally think that general consumers will never use LLMs in any significant number. I think that LLMs will exist in two distinct spaces, FOSS for devs and other technical people who want to run there own infra locally - and B2B for everything else.
The few big AI companies that manage to last will be selling access to their models for much higher prices. Probably similar to current proprietary commercial software like VMWare, SolidWorks, VEEAM, Splunk, etc. Companies will pay hundreds, possibly thousands of dollars per seat depending on the niche offering and amount of usage.
Suppose that a company developed an LLM that is trained & tuned specifically to do legal work, and suppose it produced work that was around 95% the quality of a typical paralegal. If that company charged $6,000 a year per license to work on their platform, that’s expensive, but if you’re a small firm with say, a dozen full time lawyers, then for the yearly price of a single average paralegal, you could have each lawyer using that software to do most of the work that the paralegal would have done. I can see those kinds of applications happening more and more.
This assumes though that LLMs will continue to improve at a significant rate for a long time into the future, (5-10 more years) which isn’t at all obvious, and there is some evidence that it’s already starting to hit a ceiling.
There are other ways it might work, like if there is a method of compression that is discovered that reduces the necessary RAM and Compute needs by 2-3 orders of magnitude. So models that are considered very large today (100-300 billion params at full quality) might be able to run effectively on a single 32GB GPU that costs a few thousand dollars.
So the cost to run these models is reduced immensely, and a single small data center could run enormous models with 1,000,000+ context windows for tens of thousands of users at once.
But that cuts both ways, which is something that any AI company is going to have to deal with. Once small free models get good enough to do the vast majority of a task, a user is going to start weighing the cost/benefits, and the prospect of just buying a box and throwing one of these models in for a few grand will be very appealing.
I think there may be a good market out there for “AI boxes”, compact computers designed to run a tuned LLM, set up with a little special sauce so the interface is user-friendly, etc. Companies could sell these with support contracts to legal firms, indie Dev studios, startups, small government agencies, etc.
Idk, it’s so up in the air right now, and everything is constantly changing so fast. It’s impossible to predict where things will be in 6 months, let alone 6 years from now.
You might want to check in on how well distilled / quantized models are doing, compared to gigundo datacenter versions.