The distributed replacement for Sci-Hub

TheMachineStops@discuss.tchncs.de · 9 months ago

The distributed replacement for Sci-Hub

Lime Buzz (fae/she)@beehaw.org · 9 months ago

It had me up until ‘AI’.

TheMachineStops@discuss.tchncs.de · 9 months ago

Yeah the AI thing is stupid, everyone suddenly wants to incorporate AI. Check out the telegram bot though, you can request research papers or books through the bots and someone uploads it in a couple of hours.

hendrik@palaver.p3x.de · 9 months ago

If you do it right, you can have that AI replace all the complicated pirating and downloading process. I think someone already came up with a paper writer AI. You just give it the topic, and it fabricates a whole paper, including nice diagrams and pictures. 😅

Yeah, but that also made me worry. I wonder how AI and science mix. Supposedly, some researchers use AI. Especially “Retrieval-Augmented Generation” (information retrieval) and such. I’m not a scientist, but I didn’t have much luck with AI and factual information. It just makes a lot of stuff up. To the point where I’m better off without.

Mirodir@discuss.tchncs.de · 9 months ago

AI can be good but I’d argue letting an LLM autonomously write a paper is not one of the ways. The risk of it writing factually wrong things is just too great.

To give you an example from astronomy: AI can help filter out “uninteresting” data, which encompasses a large majority of data coming in. It can also help by removing noise from imaging and by drastically speeding up lengthy physical simulations, at the cost of some accuracy.

None of those use cases use LLMs though.

hendrik@palaver.p3x.de · 9 months ago

Right, the public and journalists often lump everything together under the term “AI”. When it’s really a big difference between some domain specific pattern recognition task that can be done with machine learning and >99% accuracy… Or an ill-suited use-case where a LLM gets slapped on.

For example I frequently disagree with people using LLMs for summarization. That seems to be something a lot of people like. And I think they’re particularly bad at it. All my results were riddled with inaccuracies, sometimes it’d miss the whole point of the input text. And it’d rarely summarize at all. It just picks a topic/paragraph here and there and writes some shorter version of that. Missing what a summary is about, providing me with the main points and conclusion, reducing the details and roughly outlining how the author got there. I think LLMs just can’t do it.

I like them for other purposes, though.

Mirodir@discuss.tchncs.de · edit-2 9 months ago

Re LLM summaries: I’ve noticed that too. For some of my classes shortly after the ChatGPT boom we were allowed to bring along summaries. I tried to feed it input text and told it to break it down into a sentence or two. Often it would just give a short summary about that topic but not actually use the concepts described in the original text.

Also minor nitpick but be wary of the term “accuracy”. It is a terrible metric for most use cases and when a company advertises their AI having a high accuracy they’re likely hiding something. For example, let’s say we wanted to develop a model that can detect cancer on medical images. If our test set consists of 1% cancer inages and 99% normal tissue the 99% accuracy is achieved trivially easy by a model just predicting “no cancer” every time. A lot of the more interesting problems have class imbalances far worse than this one too.

hendrik@palaver.p3x.de · edit-2 9 months ago

What’s the correct term within casual language? “correctness”? But that has the same issue… I’m not a native speaker…

By the way, I forgot my main point. I think that paper generator was kind of a joke. At least the older one, which predates AI and uses “hand-written context-free grammar”:

SCIgen

And there are projects like Papergen and several others. But I think what I was referring to was the AI scientist which does everything from brainstorming research ideas, to simulating experiments, writing reports etc. That’s not meant to be taken seriously, in the sense that you’ll publish the generated results. But seems pretty creative to me, to write a paper about an artificial scientist…

Kissaki@lemmy.dbzer0.com · edit-2 9 months ago

If you do it right, you can have that AI replace all the complicated pirating and downloading process.

How so? I don’t see how that would work.

What are you trying to say about an AI fabricating a whole paper? It must have the same issues all trained statistical text prediction “AI” has: Hallucinations. Even if it’s extended with sources, without validating them the paper text claims are useless when you can’t be sure the source even exists or says what it claims.

There are use cases for AI, but if you are looking for papers for reasoned and documented information, AI is the worst you can use. Because it may look correct, but be confidently incorrect, and you are being misled.

This post is about scientific papers. Not predicted generated text.

hendrik@palaver.p3x.de · edit-2 9 months ago

Yeah, I think my sarcasm got lost somewhere. I thought the word “fabricate”, especially in context with facts had that slight undertone. But I’m not a native english speaker, maybe I’m wrong.

I’ve linked some paper generators somewhere in this comment tree. They’re not supposed to come up with real scientific papers. One of them is an old joke (predating AI), the next one is itself the subject of research. And with the third one, I’m not so sure. Seems like intended use is to fake papers.

I also think “hallucinations” are a major hurdle when it comes to applying AI. It comes to no surprise to me that they do it… I mean we’ve trained them on all kinds of data, the Wikipedia, textbooks, but also fictional stories, novels, Reddit posts… And we want them to be creative. Except when we don’t. But there (currently) is no set-screw, no means of controlling when we want it to stick to the facts, and when we want it to be creative or invent something, write a science fiction novel… It certainly doesn’t help if the use-case is writing factual text, or helping a customer facing issues with a bill… And the chatbot decides to be extra creative or mimick an angry Reddit user. It’ll do it. Because we’ve designed it that way.

I guess it helps to make AI more intelligent, so the chances of it infering/guessing what to do become a bit better. But I think what we really want is some means to steer it more directly. And that’ll open up more use-cases for AI. Currently, we don’t have any of that. So regarding factual stuff, AI is just very unreliable.

And if you ask me, ChatGPT, Claude etc aren’t even close to being smart enough to write a scientific paper. So that’d be yet another issue. I know people regularly claim it can pass some test for a degree, be smarter than a student… But from my own experience, ChatGPT can’t even summarize a 2 page newspaper article. All results I’ve ever seen are riddled with inaccuracies, and most of the times also missing the entire point of the article. I’ve rarely been happy with how it reworded my emails. And I let it write some hobby computer code for me. And it did a great job at writing boilerplate code, some webdesign etc. But failed miserably with the more complex things I really needed some help with. How would that thing be able to do research on it’s own?

Don’t get me wrong, I think AI and LLMs are very useful. They can assist, retrieve documents… They excel at translating between languages. I also like chatbots for their creativity. You can just tell them to come up with 5 ideas concerning whatever you’re currently doing. But there are a lot of things they can not do. And it’s probably going to stay that way for a while. Until some major (hypothetical) breaktrough, when we suddenly make them 10x as intelligent. And/Or get rid of hallucinations.

The distributed replacement for Sci-Hub

The distributed replacement for Sci-Hub

GitHub - ultranymous/stc: Distributed free search engine and AI tools that grant access to knowledge