𞋴𝛂𝛋𝛆

  • 25 Posts
  • 192 Comments
Joined 3 years ago
cake
Cake day: June 9th, 2023

help-circle







  • Awesome. Now how would you strace/ptrace the active process correlated with the return packet?

    This is way past my pay grade in the territory of edge-of-abstract – understanding.

    See one of my problems is that the malicious software is running across Python, JavaScript, and a ton of dubious packages scattered throughout the machine. It is all interconnected and using unconventional operations. Right now I am just removing a package one and a time and seeing what breaks. I will likely miss how things are interconnected. I am not at all familiar with this type of thing, and learning as I go. The system used unshare, manually created no-label packets with all records obfuscated, used a hidden daemon function in systemd, and no-account to operate outside of namespaces.



  • I’m in the process of dismantling software I will never trust or update again and coming across all kinds of sketchy stuff. There is this Python program called Sentry_SDK that is very concerning. Along with several others. It appears to be packaged with most offline AI stuff and is some of the most authoritarian nonsense I have seen. I have air gapped the computer and do not have a package installed like prettier to maybe make the JavaScript readable, and it is enormous. There are many pages that are in the 10k lines plus range.

    I already found a place in the back end that is trying to send packets with major obfuscation. The process is preloaded as listening, with every measure taken to prevent discovery of its origin. So that is fun too. I will likely reformat and start over after I have had my fun and saved what I wish to save.


  • Assuming it is a quoted string for simplicity.
    ..."http://foo.bar/"...
    $ sed -i 's/\/.*\"/injection/g'

    That is flawed in practicality, but gets the point across and will result in http:injection. It would take more convoluted escapes to replace the ‘//’.

    I was thinking there has to be a way to use the address like a printf like situation. However someone tries to use an address, it just hits a local trip wire. Pass that to anything you don’t want to connect on the internet. It is super lazy and hacky, but I don’t really care. I use an external firewall device with DNS whitelist, so I block everything anyways. Flagging stuff just makes it easy to say something to others that might benefit.





  • Complex social hierarchy is a super important aspect to account for too. In the proprietary software realm, you infer confidence in the accumulated wealth hierarchy. In FOSS the hierarchy is not wealth, but reputation like in academia or the film industry. If some company in Oman makes some really great proprietary app, are you going to build your European startup over top of it? Likewise, if in FOSS someone with no reputation makes some killer app, the first question to ask is whether this is going to anchor or support a stellar reputation. Maybe they are just showing off skills to land a job. If that is the case, they are just like startups that are only looking to get bought up quickly by some bigger fish. We are all conditioned to think in terms of horded wealth as the only form of hierarchy, but that is primitive. If all the wealth was gone, humans are still fundamentally complex social animals, and will always establish a complex hierarchy. This is one of the spaces where it is different.






  • llama.cpp is at the core of almost all offline, open weights models. The server it creates is Open AI API compatible. Oobabooga Textgen WebUI is more user GUI oriented but based on llama.cpp. Oobabooga has the setup for loading models with a split workload between the CPU and GPU which makes larger gguf quantized models possible to run. Llama.cpp, has this feature, Oobabooga implements it. The model loading settings and softmax sampling settings take some trial and error to dial in well. It helps if you have a way of monitoring GPU memory usage in real time. Like I use a script that appends my terminal window title bar with GPU memory usage until inference time.

    Ollama is another common project people use for offline open weights models, and it also runs on top of llama.cpp. It is a lot easier to get started in some instances and several projects use Ollama as a baseline for “Hello World!” type stuff. It has pretty good model loading and softmax settings without any fuss, but it does this at the expense of only running on GPU or CPU but never both in a split workload. This may seem great at first, but if you never experience running much larger quantized models in the 30B-140B range, you are unlikely to have success or a positive experience overall. The much smaller models in the 4B-14B range are all that are likely to run fast enough on your hardware AND completely load in your GPU memory if you only have 8GB-24GB. Most of the newer models are actually Mixture of Experts architectures. This means it is like loading ~7 models initially, but then only inferencing two of them at any one time. All you need is the system memory or the Deepspeed package (uses disk drive for excess space required) to load these larger models. Larger quantized models are much much smarter and more capable. You also need llama.cpp if you want to use function calling for agentic behaviors. Look into the agentic API and pull history in this area of llama.cpp before selecting what models to test in depth.

    Huggingface is the goto website for sharing and sourcing models. That is heavily integrated with GitHub, so it is probably as toxic long term, but I do not know of a real FOSS alternative for that one. Hosting models is massive I/O for a server.