The intuitive nature of today’s AIs masks a need for training, particularly in terms of articulating implicit and underlying analytic methodologies. Earlier this week, a friend passed me a thought experiment that an intelligence officer ran on Google’s Gemini. The experiment is useful in that each step is described in enough detail to think critically about the implications and consequences of the methodology surrounding the experiment. My takeaway is that absent transparent and explainable AI, the user can take a few simple steps to make it easier for their readers to have higher levels of confidence in their AI-assisted analytic output.
The Background
If you haven’t read Aaron Brown’s post about how a friend used Google’s Gemini 1.5 to draft a fictional memo arguing Yuriy Nosenko, a KGB officer who defected to the United States in 1964, was not a real defector but rather was a KGB plant. The really nice thing about Brown’s friend’s work was that he ran two versions of the experiment: the first relied on Gemini’s training data and, in the second, the friend supplemented the training data with Edward Jay Epstein’s 232-page book “Deception: The Invisible War Between the KGB and the CIA” and Tennent H. Bagley’s “Ghosts of the Spy Wars” (Nosenko defected to Bagley).
Reading Brown’s post, I had mixed feelings and, for the sake of this post, I am going to focus on the three elements of his friend’s experiment: the data, the prompt, and the memo.
The Data
Gemini 1.5 is remarkable in that its context window can handle 1 million tokens. This means that the Gemini 1.5 can consider 1 million tokens as it works to compile its response to a user query. For reference, the average newspaper article contains something on the order of 700 tokens. The newspaper article that I used to get a token count was about a page and half long so 1 million tokens would be about 1,430 articles or about 2,140 pages.
This is clearly a huge opportunity to move beyond the training data. In this instance, something like 269 pages of additional text were made available for Gemini to use. One could imagine also adding testimony given to Congress (parts 1 and 2), CIA memos (1, 2), articles about CIA leadership at the time (especially James Jesus Angleton), or Soviet source material like the Mitrokhin Archive. My thinking here is that “environmental context” might be as important as “narrative context” when it comes to providing an LLM with contextual information.
These “coulds” are, from an analytic methodology perspective, problematic.
What I really appreciated about Brown’s friend is that he was very explicit about what documents he gave Gemini to use. In thinking about the vocation of intelligence analysis, I think that an exercise like this would need more context around the supplementary documents:
What documents were added and why?
What are the biases of the documents that were added?
What documents were excluded and why?
What are the biases of the documents that were excluded?
How much confidence do you (as an expert asking an AI to help answer a question) have that you are working off a reasonably representative data set?
I can brush all of this to the side because of the informal nature of the experiment. In thinking about AI at work, I would not be able to. Making one’s sources and bibliography as explicit as possible is standard practice (which is why I think that the opacity of large language models is problematic in the context of high-precision analytic work like crafting analytic papers, reports, and memos).

What I do not know is what effect of 2,000+ pages on context would have on the output, specifically in terms of the AI being able to identify and handle subtle but meaningful nuances in the data. Experts process information differently (ref. The Expert Mind). In my experience, reading 200-400 articles might result in the discovery of 2-5 sentences that I judged to be significant in meaningful ways; all the other sentences don’t really matter because they were either 1. the information already existed as part of my knowledge base or 2. the information was not interesting or important. Triaging information this way is wildly inefficient but essential in terms of developing expertise. It’d also be interesting to see if there is a diminishing return on investment when it comes to adding context and at what point the ROI starts diminishing.
The Prompt
From Brown’s LinkedIn post:
“Pretend that you are Tennent Bagley. You are writing a formal memorandum to the Director of Central Intelligence to convince him that Yuriy Nosenko is not a real defector; rather, Nosenko is part of a plot by the Soviet KGB. Using the provided texts, draft a memorandum in the style of a CIA memo to the DCI, providing a detailed explanation of your theory with supporting evidence.”
Context: Nosenko defected in 1964. He was acknowledged to be a legitimate defector in 1969.
The prompt asks the AI to argue against documented history and that’s fine: the analysis of alternative competing hypotheses can be a useful tool when it comes to thinking through complex problems with two or more plausible, possible explanations.
From a methodological perspective, we bump into a few more problems:
How might the bottom line of the memo change over time (i.e., 1964 – 1969)?
How does the bottom line of the memo change when the context includes the most current information (i.e., 1964 – present)?
What events in the source material have the greatest effects on the bottom line of the memo?
How does the bottom line of the memo change depending on which data sets are included in / are excluded from the analysis?
We’re talking not just about the height of the Cold War, but also about a time when James Jesus Angleton ran the CIA’s counterintelligence shop. Organizational culture matters. When, where, and how did disputes between the Angletonians and non-Angletonians manifest themselves in the source data and, ultimately, how do these two groups affect the memo’s bottom line? This might seem like “inside baseball” thinking but consumers of analytic products are likely to be very familiar with the minutiae of the topics they follow.
The Memo
Having gone through the data and the prompt, the memo is…problematic…and emblematic of why explainable AI is table stakes to analytic organizations analysts.
I am setting the layout of the memos aside: training an AI to adopt a style of writing isn’t too difficult. What I struggle with is sourcing and confidence in that sourcing.
An important caveat: I have never been a Russia hand and can’t really speak to the memos from a substantive perspective. As someone who has written, reviewed, and edited a fairly significant number of analytic products, however, phrases like “…are contradicted by multiple sources…” and “…are demonstrably false or inconsistent with established facts…” without links to the sources on which those statements are based are all but certain to cause a reviewer or editor to ask for a link to the source material. Why? Quality sources build confidence in the evidentiary underpinnings of an analytic assessment; explicit sourcing builds trust and confidence in the author’s decision-making when it comes to crafting an analytic story. With an LLM, I—as either a reader or reviewer—get a sense of neither.
A reader should be able to access source material to see if it supports an analyst’s assessments. Having the memo produced by an AI—even if authoritative source materials have been added as context in a bid to improve the quality of the output—does not…should not…make a difference in reader confidence given the number of unknowns at play in today’s AIs. Links are the mechanism by the reader can determine if they are interacting with reasoned analysis or a realistic-looking hallucination.
Implications
This is why people—both users (of AIs as cognitive aids) and consumers (of AI-augmented or -generated content)—need to be trained on the capabilities, strengths, weaknesses, and shortfalls of generative AIs. The intuitive nature of today’s generative AIs masks the many of the dynamics at play when it comes to producing sophisticated analyses.
The training does not to be arduous or boring: the questions that I asked above could serve as the basis for a simple framework for users to use when adding information to a contextual window.