𞋴𝛂𝛋𝛆

  • 23 Posts
  • 182 Comments
Joined 3 years ago
cake
Cake day: June 9th, 2023

help-circle


  • Complex social hierarchy is a super important aspect to account for too. In the proprietary software realm, you infer confidence in the accumulated wealth hierarchy. In FOSS the hierarchy is not wealth, but reputation like in academia or the film industry. If some company in Oman makes some really great proprietary app, are you going to build your European startup over top of it? Likewise, if in FOSS someone with no reputation makes some killer app, the first question to ask is whether this is going to anchor or support a stellar reputation. Maybe they are just showing off skills to land a job. If that is the case, they are just like startups that are only looking to get bought up quickly by some bigger fish. We are all conditioned to think in terms of horded wealth as the only form of hierarchy, but that is primitive. If all the wealth was gone, humans are still fundamentally complex social animals, and will always establish a complex hierarchy. This is one of the spaces where it is different.






  • llama.cpp is at the core of almost all offline, open weights models. The server it creates is Open AI API compatible. Oobabooga Textgen WebUI is more user GUI oriented but based on llama.cpp. Oobabooga has the setup for loading models with a split workload between the CPU and GPU which makes larger gguf quantized models possible to run. Llama.cpp, has this feature, Oobabooga implements it. The model loading settings and softmax sampling settings take some trial and error to dial in well. It helps if you have a way of monitoring GPU memory usage in real time. Like I use a script that appends my terminal window title bar with GPU memory usage until inference time.

    Ollama is another common project people use for offline open weights models, and it also runs on top of llama.cpp. It is a lot easier to get started in some instances and several projects use Ollama as a baseline for “Hello World!” type stuff. It has pretty good model loading and softmax settings without any fuss, but it does this at the expense of only running on GPU or CPU but never both in a split workload. This may seem great at first, but if you never experience running much larger quantized models in the 30B-140B range, you are unlikely to have success or a positive experience overall. The much smaller models in the 4B-14B range are all that are likely to run fast enough on your hardware AND completely load in your GPU memory if you only have 8GB-24GB. Most of the newer models are actually Mixture of Experts architectures. This means it is like loading ~7 models initially, but then only inferencing two of them at any one time. All you need is the system memory or the Deepspeed package (uses disk drive for excess space required) to load these larger models. Larger quantized models are much much smarter and more capable. You also need llama.cpp if you want to use function calling for agentic behaviors. Look into the agentic API and pull history in this area of llama.cpp before selecting what models to test in depth.

    Huggingface is the goto website for sharing and sourcing models. That is heavily integrated with GitHub, so it is probably as toxic long term, but I do not know of a real FOSS alternative for that one. Hosting models is massive I/O for a server.








  • TSMC is all that stops the invasion of Taiwan. If TSMC is not relevant even for a moment, China will invade to end their civil war. You can count on NK invading SK at the same time and taking Samsung.

    China already has domestic incentives in place for home grown GPUs. They will likely displace Nvidia entirely within 5-8 years.

    Ultimately, a unified architecture will win. The reason CPUs cannot handle the load of AI is due to the L2 to L1 cache bus throughput. It requires a major redesign, but it is a solvable problem. The real problem is that that kind of redesign takes the full 10 year hardware design cycle time to create from scratch.

    AI is still not going away in the long term. The present world is just like the early days of the microprocessor. The 6502 was little more than a toy. It is still in all western digital hard drives. The fundamental architecture is still the same in all CPUs. It was the systems we built around them that made them useful. A base inference model is primitive. The AI that owns the future is agentic systems.






  • Do any of these types of units manage to play something like Super Metroid with perfect controls to the point of original ROM glitch key combos? Even the switch native emulation failed at that one when I tried before I quit anything Nintendo related for their lack of ethics, theft, and extortion business practices. I got irritated when playing emulation if I could not play according to the old archived guides for the game. The emulated controls couldn’t do the same original key combos even when doing so programmatically with a microcontroller.



  • US is not supporting Russia, but Iran is. Israel is atrocious and it would not surprise me at all if they absolutely know about Oct 7th well in advance. It was the result if their prejudice and Palestinian concentration camps before Oct 7th in either case that caused the initial attack. However it was not entirely without cause like with Russia in Ukraine. Everywhere is complicated. The USA is super polarized and in pretty bad shape, but it us not exporting suicide bombers. Is it better to target with 10 million dollar munitions remotely no, but those are not targeting crowds of people as the primary goal