Terms – A list of 25 words/phrases determined to be markers for content related to surveillance.
These terms were determined to be a representative list of words and phrases that would indicate the presence of relevant content in a document. We also considered time period-specific terminology (e.g., Edgar Hoover) and whether certain words would have alternative meanings that would return irrelevant results (for example, “intercept” was excluded from our list because of the potential for football-related documents to be pulled from this search term).
To generate this list, we performed a term-frequency analysis of the text of a series of reports published and recommended by the Congressional Research Service (CRS) as relevant to surveillance, for original use as part of nationwide high school debate competition. The top 260 terms from this corpus and potentially relevant outliers were voted on by each of the researchers, and then the list of terms was narrowed to 25 through a group process weighing relevance, frequency, ambiguity, and time period.
Each of our sources was searched for each of these terms and results sorted into groups based on term (this process varied slightly from source to source; for a breakdown of this process, see the Sources page)
|“employee” AND “monitoring”|
|“intercept” and “mail”|