New Invitation to Offer Feedback: Don’t Trust AI to Cite Its Sources (from @AnnaRMills & me)

Estimated reading time: 1 minute, 7 seconds

I’ve been talking about this for about a year on Twitter. That I am not as concerned about students “plagiarizing” by using AI, I’m more concerned that text-based AI (just like image generation AI) gets data from sources that it never cites or links back to. What is worse is that some of the ones that now do claim to cite their sources, aren’t always citing the correct sources (unless they’re really obvious ones like Shakespeare), and can often cite wrong sources, or link to something that is real, but does not actually say what it claims it says. So when we ask students to cite when they use AI, is this really enough? Because they’re citing the AI, but not the actual sources of information that AI used to get there.

Anna Mills and I have been discussing this and testing it for a while, and we’ve written a piece on it. Feedback is welcome. You can give feedback on my blogpost here, or on social media (we’ll post on Twitter, LinkedIn and Mastodon).

Here is piece: Don’t Trust AI to Cite its Sources – awaiting your feedback.

Featured image of a blind-folded robot and people and data floating around its head and behind it (signifying that it does not know where the data it gets is coming from) generated by DALL-E3 via poe.com

5 thoughts on “New Invitation to Offer Feedback: Don’t Trust AI to Cite Its Sources (from @AnnaRMills & me)

  1. My hunch is the results of a generative AI query can never be traced back to specific sources, or maybe the answer is you have to cite every single source an LLM was trained on. I go back to a quote in an article titled “Anthropic cracks open the black box to see how AI comes up with the stuff it says” (I don’t dare include a URL because that always flags my comments for spam) which cited research from Anthropic:

    “Researchers know how to build the AI, and they know how AIs work at a fundamental, technical level. But what they actually do involves manipulating more numbers, patterns and algorithmic steps than a human can process in a reasonable amount of time.

    For this reason, there’s no direct method by which researchers can trace an output to its source.”

    I think the way we think about search and content is stuck in experience that is vastly different from what we are seeing spit out now in LLMs. It’s not going back to specific papers and sources, it is scouring everything it has ingested.

    The word completion probability vectors used are completely de-coupled from the original source. There is no connection from output to identifiable sources. Everything is based on mass stats of the way language uses those phrases “Intentionally” “Equitable” “Hospitality” — a fascinating example for you to use since I bet outside of the work you researched and published, they are not quite as common (hence the false results sprung from mentions of “Hospitality”.

    That they can echo sources that look correct at first glance is quite impressive. But I would never expect them to tie results directly to citeable sources since they are not drawing directly from them.

    There is an illusion when we we cited URLs like in Bing and Perplexity, those are often real, but they are not the training data.

    I’d say, and very likely will be wrong, that GenAI is incapable of direct citation of what it draws from.

  2. As it is unlikely we can get reliable sources for words generated by GenAI for all the reasons stated in your article, we’ll need to develop some strategies to account for this flaw so we can continue to encourage good academic practices which nurture critical thinking and genuine human writing skills.

    Possible options include:
    * Ask GenAI to generate an answer to a question, if possible with the instruction to cite sources (hollow laugh).
    * Interrogate the accuracy of that answer – actively seek the sources for the assertions/ideas generated, via separate searches not using a modern GenAI tool (for example, use two different search engines on another browser).
    * While investigating those sources, write down the search terms used, then use these same search terms in the GenAI tool to discover whether it actually reveals some accurate sources which are comparable with the sources found via other tools.
    * Critique the accuracy of the GenAI output – is there evidence for the claims made in the GenAI output.
    * Write about the findings in you own words (novel idea!). Place the full text of the AI generated text at the start of the findings analysis, cite when it was generated and what tool was used to generate it.
    * Use these steps as a way to alert students and academics about the flaws in using GenAI without critiquing its outputs or understanding how it generates sometimes inaccurate hallucinations.

    It is probably also worth acknowledging that in some cases when mulling over many ideas and formulating new ones, humans don’t always make a note of every source or fleeting thought which might have influenced our thinking. We sometimes have to unpick what we think might have drawn us in a particular direction, then retrospectively seek the sources based on our memories (could be a a recollection emerging from years previously) and plausible influences.

    I see a role for educators in helping people develop awareness of GenAI shortcomings, learning techniques for interrogating and harnessing it safely and productively.

    This whole response was written by me, I don’t use GenAI tools yet by personal choice, as I love formulating written responses myself, part of the joy of thinking, seeking, learning and sharing.

  3. Of relevance is from the Atlantic “Generative AI Can’t Cite Its Sources” (I am not adding a link, because WordPress flags me for spam, but easily findable. Its paywalled, but can be read via 12ft dot io).

    The flaw in our logic is that we think AI is pulling from sources as we know them, whole pieces of content, but it never does. It just produces something close enough to be “truthy” as Steven Colbert used to say. Even when as Perplexity simulates via “retrieval-augmented generation” RAG- it is fraught with problems:

    “First, a chatbot turns the user’s query into an internet search, perhaps via Google or Bing, and “retrieves” relevant content. Then the chatbot uses that content to “generate” its response. (ChatGPT currently relies on Bing for queries that use RAG.)

    Every step of this process is currently prone to error. Before a generative-AI program composes its response to a user’s query, it might struggle with a faulty internet search that doesn’t pull up relevant information… Even if a chatbot retrieves good information, today’s generative-AI programs are prone to twisting, ignoring, or misrepresenting data. Large language models are designed to write lucid, fluent prose by predicting words in a sequence, not to cross-reference information or create footnotes.”

    What we think of as sources and what AI constructs are not the same, so it seems like all results need to be source validated.

  4. I found this piece very interesting. I can barely remember a time before I used Google to search, to cite, to build academic arguments and also to check them in the things I read. Google is a big frame of reference for how I write digitally with academic referencing.

    You have shown how AI seems to be improving (from a very bad start) at citing but is still not anywhere near trustworthy.

    I wonder if we put too much faith in Gen AI to improve. Google Search was stable for a very long time although lately, it may be getting worse. It is certainly more spammy and has more ads now and the web has become more polluted with poor content. Indeed we may have passed the Golden Age of Google as Gen AI could pollute the web even more badly with synthetic text of poor quality. Google peaked early maybe Gen AI did too.

    There is an assumption that Gen AI will get better at citing but your piece seems to indicate that it may never and it just may not be able to do this at all. Just as hallucinations are what make Gen AI so amazing and powerful they are also its fatal flaw that cannot be designed out. If you take out the hallucinations Gen AI won’t work.

    It looks like from your work that we need humans in the loop, at least for any foreseeable future to check the output and those people need to be subject matter experts.

  5. As an older student I thought AI would be the answer to my writing woes. Bot all for naught. I have asked questions about topics I know very well and really only gave me general answers even when I kept asking. So for me its more of a place to get general knowledge questions answered about a topic I don’t know. But seems weak if one needs to deep dive into a subject.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.