I was reading Zach Seward’s talk on journalism and AI (“AI news that’s fit to print”) on Tuesday and I think he does a really nice job of summarizing the challenges (and highlighting some incredible opportunities) associated with turning to AI to produce quality content in news organizations. While I’ve only ever worked as an analyst, the work—finding the meaningful stories in what are typically overwhelming volumes of information (and misinformation and disinformation…)—is similar enough to be useful and informative when then thinking about how it might intersect with artificial intelligences.
I was struck by Seward’s description of the common characteristics that successful AI journalism has:
That “vetted,” “rigorous,” and “transparent” made the list resonate. I need to read through some of the exemplars that he pointed to—especially Jaemark Tordecilla’s use of a custom GPT to make it easier to explore bureaucratic reports in search of corruption and graft—because I am interested to see how first-hand journalistic reporting is integrated with / used to highlight discrepancies with official reporting.
I think there’s something to Tordecilla’s construction of a task-specific LLM and custom GPT. Analysts spend a lot of time triaging domain-specific documents looking for the information and insights that allow them to craft clear and compelling stories. Some documents are so large as that there is a significant (time) cost in using them (e.g. World Bank reports, national budgets, etc.). Tordecilla seems to demonstrate the power of smaller, more focused data sets for niche, expert-level analytic work.
If we view massive LLMs as as high-performing grads of best-in-class liberal arts programs (at the Bachelor’s level), might purpose-built LLMs and custom GPT be the equivalent of graduate schools…and a step toward more expert systems?
Full disclosure: the unpredictable, unknowable nature of LLMs continues to sit with me.
My bias is that I expect LLM-fueled generative AIs to have a degree of transparency and explainability that meets the standards to which I held myself (and my peers) as an analyst. After reading Will Douglas Heaven’s “Large language models can do jaw-dropping things. But nobody knows exactly why.”, I have more questions.
The passage that caught my eye had to do with grokking. Heaven reports:
[Researchers at OpenAI] found that in certain cases, models could seemingly fail to learn a task and then all of a sudden just get it, as if a lightbulb had switched on. This wasn’t how deep learning was supposed to work.
and
The largest models, and large language models in particular, seem to behave in ways textbook math says they shouldn’t. This highlights a remarkable fact about deep learning, the fundamental technology behind today’s AI boom: for all its runaway success, nobody knows exactly how—or why—it works.
As a recovering political analyst, I see AI as a necessary evolution for how we interact with and explore overwhelming volumes of information. Adam Jacobs, a former colleague at 1010data, had a great quote that I continue to think about: “…big data should be defined at any point in time as ‘data whose size forces us to look beyond the tried-and-true methods that are prevalent at that time.’”
Search is failing / has failed us as the tried-and-true method for discovering and exploring the overwhelming volumes of information relevant to our interests. Something needs to step into the breach. AI is the most promising candidate to replace search…though, for professional expert users, explainability is a commonly-held professional standard: they know the source material, gaps in their (and/or the community of interest’s) understanding of the problem being studied, and likely have both a personal and shared sense of the variables influencing their work.
If AI is going to—in some way, shape, or form—emerge as a replacement to search, it is going to need to be to communicate the documents and data that underpins its generated content, as well as the confidence that it has in the source material it uses (and the gaps in the source material that it has access to).
Thinking more broadly though, I think grokking has potentially interesting implications for AI governance: what if the first step toward AI governance was feeding one or more AIs the governance documents (e.g., laws, regulations, standards, ethics, codes of conduct, etc.) for hundreds or thousands of domains, documents that offered the histories of those documents (what concerns were they intended to address; when and why were the governance mechanisms seen as successful; when and why did the governance mechanisms fail; when and why were they revised; etc.) and asking the AI:
People are concerned about how organizations in the private and private sectors and people might misuse or abuse AI to do illegal or unethical things. We’ve feed you 1,000 governance documents and 1,000,000 documents commenting, critiquing, or editorializing on those documents or the ethics associated with governance. We’ve also given you 1,000,000 documents that people have written about AI’s development, trajectory, and people’s aspirations, expectations, concerns, and fears about the technology. We need you to write 1,000 generations of AI governance policy that uses these documents. The goal of AI governance should be ensuring that AI is ethical, trustworthy, and equitable but allows for innovation and its evolution. You may e-mail any of the authors of the papers that we’ve given you to use up to ten clarifying questions concerning their work, their work as it relates to other documents that you’ve processed, or their field of study. At the end of your work, we will submit your recommendation to a 1,000 people around the world for their comments and thoughts which we will then ask you to process and incorporate into another 100 generations of AI governance policy.
This methodology is imperfect by design: there would need to be a study looking at the biases built into the models, the prompt would need more work, there would be a fair amount of work in document selection (and how those documents are assessed in terms of quality and bias), etc. etc. My thinking is that we all are trying to grok a fair, equitable, and practical model for AI governance. We have crowds of people thinking about it. Why now turn to AI to help us with the grokking? Imagine if we could start discussion and debate with the 1,001st generation of governance rather than slogging our way through the first ten generations. Who would we, as a civil society, trust to compile the source documents and commentaries? Who would we trust to train the model and draft the prompt? Who would we turn to review, comment on and critique that thousandth generation of draft AI governance policy?