Session Summaries by Jelena Kalezic

written by: Student JK — October 8, 2024

Summary Session 2, 25 September 2024

In the second session of the course, we explored the evolving role of digital tools and data in historical research. We began by discussing the distinction between digital history and history in the digital age. While not all historians are “digital historians,” many engage with digital sources in their work. We also examined how digital history allows for new methods of analyzing and interpreting primary sources, which are now available as electronic data. We delved into the concept of data, tracing its origin back to ancient Greece and its transformation into a key term in modern languages, signifying information that can be measured or analyzed. We also learned about metadata, which is essential for organizing and retrieving information in the digital world. Metadata provides context, helping historians assess the reliability and relevance of sources. One interesting aspect was the discussion of “data as capta,” emphasizing the human role in actively shaping and interpreting data, rather than seeing it as merely given. Additionally, we explored research phases—heuristics, analysis, hermeneutics, dissemination, and preservation—and how digital tools enhance these processes. Through tools like Google Books Ngram Viewer and Tropy, we saw practical applications of these concepts, aiding historians in managing, analyzing, and presenting research. Overall, the session emphases the importance of critical engagement with digital sources, as well as the potential and challenges posed by the digital transformation of historical research.

Summary Session 3, 02 October 2024

In the third session of the course, we presented and analyzed various online archiving platforms. My group presented on crowdsourced born-digital archives, specifically focusing on the September 11 Digital Archive. This platform, created by the Alfred P. Sloan Foundation and others, gathers personal accounts, art, audio, and more. However, one significant issue we encountered was the lack of information on some sources, which made it difficult to assess their reliability for historical research. For example, many art pieces were credited to unknown authors, creating confusion about ownership and authenticity. Several other groups also faced similar challenges regarding access and copyright. For instance, the presentation on the Luxembourg Web Archive highlighted copyright restrictions that limited the ability to archive all relevant websites. Similarly, the family and personal archives group encounter broken links and copyright issue, making it challenging to share these archives effectively. The presentations revealed that while archiving digital content is valuable for preserving history, it also presents significant issues, particularly around access, copyright, and source reliability. The course was useful in showing how divers digital platforms approach these challenges differently, and it highlighted the ongoing work needed to make digital archives more comprehensive and accessible. Overall, this session helped me understand the complexities involved in digital archiving and the importance of critical evaluation when using online platforms for historical research.

Summary Session 4, 09 October 2024

In our course on machine learning and historical media, we explored how the Impresso project links data, people, disciplines, etc. to study old newspapers. We learned that newspapers are rich source of information, reflecting the ideas and beliefs of past societies. In the newspaper, we can get an idea of what people wore, how they sold food, and how they lived. These newspapers have been digitized, and we can now use machines to search through them easily. During the demo, we used the Impresso website to find out how often certain words appeared in historical newspapers. For example, we searched for words like “Atomkraft” and “nucléaire,” and we saw how often they were mentioned in different countries over time. We also learned about “tokens” which are small units like words, used for counting how often something appears in the data. At the end, we did a hands-on activity in groups. My group, Group 3, worked on an Ngrams project. We searched for the term “Plan Marshall” and found it in over 68,000 mentions in 43,617 articles, with most appearing between 1948 and 1951. We discovered that language filters only worked with single tokens like “Marshall.” Overall, what I took away from this course is that the group project helped us learn how to use machine learning to find patterns in historical newspapers and better understand how information was shared in the past. Finally, we learned how to use Github to submit our homework. This project helped us see how machine learning can uncover patterns in history through old newspapers.

Summary Session 5, 16 October 2024

In our fifth course session, we began by covering the use of maps and GIS as tools for communication and historical analysis. Then, each group had to do a hands-on activity, exploring different sites with maps and, after exploring, creating a story map about the information each group gathered. My group worked with Atlascine maps, and we chose the “Bubbe’s Life Story Interview” as a way to see how the map could enhance storytelling. Later, we also created our own story map about Atlascine. In our story map, we used basic tools. We focused on answering our four questions about Atlascine using simple text and pictures. For the pictures, we took screenshots of the Atlascine website to give a visual illustration of our work. We didn’t use other tools because we were short on time due to technical problems with a demo video that didn’t work. However, we still gained certain knowledge about the story maps platform that we could use later. I personally liked the program. It’s a mix of Word documents and PowerPoint presentations, but better, with a lot of tools. Most importantly, we can customize the tools as we want and design our own map choreography with audio. For comparison, I looked at another group’s work on the “John Snow Map.” They used very interesting tools in their story map, like an image of the original John Snow map. As you scrolled down their text, you could first see the recorded deaths, then the location of water pumps would appear, and further down, the contamination hotspot on the map was revealed. This group created an amazing logical structure.

To answer the question of what I gained from this course, it is certainly an amazing online tool that I can and will use later for my own projects. This tool is an amazing way of showing a historical event by using maps as a historical source.

Summary Session 6, 23 October 2024

In the sixth session of the course, we explored the fundamentals of social network analysis through the practical example of a wedding guest list. This exercise introduced us to mapping social relationships by identifying connections among family members and friends, helping us understand social structures in a simple, visual way. We identified individuals as “nodes,” representing people connected through various social ties, and used “attributes” like age, hobbies, or relationships to guide seating arrangements and determine connections within tables. We examined the roles of “breakers” in social networks. Brokers connect different groups by passing information between them, while breakers control the flow of information, deciding whether to spread or stop it. The course taught us about two main types of networks “affiliation networks” where things stay still and “interaction networks” where things are always moving. We also learned that networks can have one type of connection and that’s named “Unipartite” or two types “Bipartite” based on what’s being linked together. Additionally, we reviewed the importance of network visualization for orientation and categorization, understanding aspects like diameter, density, and betweenness centrality. At the end, we did a hands-on activity in groups. we used Palladio, a digital tool for visualizing and analyzing data, to create and analyze our networks. My group applied the wedding example, experimenting with various attributes and connections to see how they affect the overall network structure.

This course taught me how Social Network Analysis clarifies complex social systems and I especially enjoyed the hands-on activity with Palladio, which deepened my understanding of network visualization.

Summary Session 7, 30 October 2024

Summary:

In the seventh session of the course, we explored the groundbreaking role of artificial intelligence (AI) in the European Parliament’s digital archives, guided by Ludovic Delepine and Marco Amabilino. Beginning with a video, we learned how AI is reshaping access to over a million documents, covering the years 1952 to 1994, honoring the “30-year rule” that makes documents over three decades old available to the public. These archives are not just a record of history but a trusted source, vital for researchers and the public. A turning point came in 2007 when Google’s push toward open-source software brought serious advancements in document processing, a shift that deep learning later amplified. This allowed for unprecedented precision in extracting and interpreting historical data. Google’s 2017 breakthrough even transformed text into numerical data, enabling AI to identify patterns and word associations more effectively. Lady Sparck Jones, a visionary in AI and language processing, was a key figure in this journey. Her pioneering work, especially her 1972 “Inverse Document Frequency” algorithm, became foundational in shaping modern search tools. This technology allows users to retrieve relevant documents by identifying frequently occurring terms, making historical data far more accessible. The Parliament’s new tools, like the “Ask the EP Archives” feature, echo recent innovations like ChatGPT but provide responses strictly based on verified archives. Unlike ChatGPT, which can sometimes generate unsound answers, “Ask the EP Archives” is designed solely to respond based on factual documents within the archive, enhancing educational research and transparency while fostering trust in these invaluable historical records.

I found the course on AI in Parliaments fascinating, especially in how AI can revive historical archives and improve accessibility across languages. However, understanding deep learning’s role in metadata classification was challenging.

European Parliament Archives:

The European Parliament Archives strives to democratize knowledge about the history of the European Parliament by making a collection of historical documents accessible to the public. This invaluable resource offers insights into European legislative history from 1952 to 1994, benefiting both researchers and citizens. With tools for keyword searches, topic modeling, and data visualizations, users can deeply explore parliamentary topics and trends. Furthermore, it enriches the understanding of parliamentary developments for educational and civil institutions worldwide. The European Parliament Archives includes collections from the European Coal and Steel Community, the Ad Hoc Assembly, and the European Parliament itself. These collections feature various document types, including motions, resolutions, written questions, and oral interventions. Interactive tools like visualizations and topic models empower users to navigate legislative topics and trends, making the study of European parliamentary history engaging and insightful.

To begin with the Content Analysis of the Dashboard, users can analyze documents using a variety of tools. Upon selecting the Dashboard, four options are available. The first is the EP Archives Overview Dashboard, where users can view the number of selected documents, their languages, and a circle graph displaying the percentage and count of available documents in each language. Users can also see the types of documents, categorized by year and date of creation.

The second option is the EP Archives Content Analysis Dashboard, which presents the number of relevant documents in the archives. It groups documents by dominant topics and provides a diagram of documents by year. One particularly interesting feature is the word cloud of Eurovoc labels, where clicking on a word reveals how frequently that specific term is mentioned across all documents in the archive.

The third option is the EP Archives Received Requests Dashboard, which shares similarities with the previous dashboard but focuses on the total number of requests, requests by organization, and requests by year. A unique feature here is the world map that displays requests for specific documents by country, highlighting which organizations, such as research laboratories and institutes, have made requests in each nation.

The fourth dashboard offers tools that allow users to summarize documents by extracting key sentences to create concise summaries. I find this tool amazing because it quickly helps users grasp essential points in lengthy texts. Additionally, it includes a privacy warning, ensuring that users can trust their information remains private, even when dealing with confidential or personal documents. Importantly, almost every dashboard allows users to select controls and specify their interests, such as fonds, series, dossiers, and titles. However, I often lose track of the documents I was previously viewing because I forget to save my selections. The option to see the hierarchy view is incredibly helpful, enabling me to easily find all the documents I was working on without wasting time.

Form me personally, my favorite tool is “Ask the EP Archives”. This is a remarkable tool that brings history to life, offering answers grounded in European Parliament documents. Unlike general AI, like Chat GPT, it is trustworthy, and respecting our curiosity with genuine, accurate responses. Its dedication to transparency, it’s even admitting when it doesn’t have an answer which is for a change refreshing. And with multilingual support, it opens doors for people across different languages to access Europe’s history authentically. For me this is not just a tool but it’s a gateway to understanding our shared past.

However, I do find a minor issue with the Dashboard. At the bottom, where we can click on a row to open an attached document, I noticed that when I maximize the table, there’s a button for “Menu options” that allows us to choose between “Export to CSV” and “Export to Excel.” However, when I click on either option, nothing happens, only a notification appears, stating, “Working on your CSV file,” which is confusing. I wasn’t sure what to make of this message. Eventually, I discovered that the Dataset info section explains how to export data, which I believe should also be included in the notification. This addition would save a lot of time for users, especially those unfamiliar with the program.

Questions:

  1. What measures are in place to ensure the trustworthiness of the AI-generated summaries and classifications of documents?
  2. How user-friendly are the current dashboards for individuals who may not have a technical background?
  3. How might future generations of researchers benefit from the advancements in AI technology in archival work?
  4. What is the significance of the “30-year rule” for document access?
  5. What is the main challenge when managing a large number of documents?