Scraping the Barrel

“Chatbots cannot think like humans: They do not actually understand what they say. They can mimic human speech because the artificial intelligence that powers them has ingested a gargantuan amount of text, mostly scraped from the internet.” So it’s worth asking which sites are being used as source material. NextDraft is one the domains included in Google’s AI crawl, ranking 255,457th among the sites scraped. It’s just nice to be included. I guess. Or maybe not. In its investigation of sources, WaPo “found several media outlets that rank low on NewsGuard’s independent scale for trustworthiness: No. 65, the Russian state-backed propaganda site; No. 159, a well-known source for far-right news and opinion; and No. 993, an anti-immigration site that has been associated with white supremacy.” But, like everything on the internet, it gets worse. “The Post found that the filters failed to remove some troubling content, including the white supremacist site No. 27,505, the anti-trans site No. 378,986, and No. 4,339,889, the anonymous message board known for organizing targeted harassment campaigns against individuals.” WaPo (Gift Article): Inside the secret list of websites that make AI like ChatGPT sound smart.

I figured I’d ask ChatGPT to chime in on the matter. “As an AI language model, my ability to sound smart is largely based on my programming and the quality of the language data that I have been trained on…However, it’s important to note that while I can provide intelligent responses, I am still a machine, and my responses are generated based on algorithms and statistical patterns. My responses may not always be perfect or reflective of human intelligence.” (It’s nice to end on a positive…)

Copied to Clipboard