Scraping the Barrel

“Chatbots cannot think like humans: They do not actually understand what they say. They can mimic human speech because the artificial intelligence that powers them has ingested a gargantuan amount of text, mostly scraped from the internet.” So it’s worth asking which sites are being used as source material. NextDraft is one the domains included in Google’s AI crawl, ranking 255,457th among the sites scraped. It’s just nice to be included. I guess. Or maybe not. In its investigation of sources, WaPo “found several media outlets that rank low on NewsGuard’s independent scale for trustworthiness: RT.com No. 65, the Russian state-backed propaganda site; breitbart.com No. 159, a well-known source for far-right news and opinion; and vdare.com No. 993, an anti-immigration site that has been associated with white supremacy.” But, like everything on the internet, it gets worse. “The Post found that the filters failed to remove some troubling content, including the white supremacist site stormfront.org No. 27,505, the anti-trans site kiwifarms.net No. 378,986, and 4chan.org No. 4,339,889, the anonymous message board known for organizing targeted harassment campaigns against individuals.” WaPo (Gift Article): Inside the secret list of websites that make AI like ChatGPT sound smart.

I figured I’d ask ChatGPT to chime in on the matter. “As an AI language model, my ability to sound smart is largely based on my programming and the quality of the language data that I have been trained on…However, it’s important to note that while I can provide intelligent responses, I am still a machine, and my responses are generated based on algorithms and statistical patterns. My responses may not always be perfect or reflective of human intelligence.” (It’s nice to end on a positive…)

This blurb is from the April 20, 2023 edition of NextDraft