Not sure if this is the right community but seems close enough.
Ideally i want a url that i can just put any paywalled news article into that will return the unpaywalled version.
Ie: https://somedomain/https://somenewssite/somenewsartle
I need it to work with https://pypi.org/project/newspaper4k/
Alternativly if someone knows of another python library that can extract article text and images automaticly just from a link that would also solve my problem.
12ft works, if you really need to. But in general, I just don’t read any publications that paywall their content. Mass media is all owned by one or two billionaires, if they need money they can get it from them.
Generally, 12ft.io works pretty well for me.
Looks like newspaper4k uses headless Chrome. You could try loading the Bypass Paywalls Clean extension and browsing the pages directly.
I regularly use it (in Firefox) without even thinking about it. Only notice when I send someone an article they can’t access.
It does not use headless chrome it just uses the python requests library. Did u get got by an ai hallucination?
Source: i went digging in the source code.
No, just this example code from their site:
browser = p.chromium.launch(headless=True)
My mistake was not knowing where newspaper4k fits in the stack. They’re wrapping it with Playwright, which it seems you could do here.
Ahh i see. Im using newspaper4k to fetch articles directly it seems the example u found is just using it simply as a parser after using playwright as a html fetcher. I might try that approach.
Most of the time archive.today gets the work done
It also offers a URL to get a snapshot from a given URL: http://archive.is/newest/http://lemmy.dbzer0.com/c/piracy
Why use one when you can use 6
Yeah ive tried that only some of em work in an easy way to implement but if the one im currently using goes down then i guess ill have to bodge somthing together.