Session Summaries by Jo Devaquet
[Data, Metadata & Tool 2 ‘Tropy’], [26.09.2024]
The beginning of the course dealt with the definition of a “digital historian”. The question of when a historian becomes a digital historian was raised, and the class was confronted with various viewpoints related to it. In the narrow sense, almost everyone today is a digital historian; in a slightly broader sense, it only applies to those who access digital tools and not just use digital sources. Afterward, the term “data” was explained in more detail: Data are information used for reasoning, arguing, or calculating. In a humanistic sense, one can also understand data as capta. It was shown that the significance of this term has increased in modern times, especially since digitization. Finally, it was listed how data can be handled, from creation to the modification of its properties and materiality. When talking about digital data, one automatically talks about metadata. For this reason, the class was given an overview of what is understood by metadata and how to find it. Metadata is information about information. It helps organize and structure large amounts of data or differentiate between multiple versions of similar-looking data. Metadata is also used – especially by historians – to make assumptions about the given information. It helps to assess, for example, if a source is linked to a desired time period, if it originates from a transparent source, and so on. Tropy ties in exactly here. It is a program used to annotate, comment on, and transcribe collected data, and thus it helps organize, hierarchize, and structure it. This works, for example, by creating ‘projects,’ adding keywords, and metadata.
title: Session Summaries by Jo Devaquet abstract: ‘Summary 2’ authors:
- Jo Devaquet-0191486141 date: 2024-10-02 —
[Web Archives], [03.10.2024]
The lecture of the 2nd of October 2024 consisted of different group works. The group work that had been assigned to my group was about the dialogs created by different people, historians or bots working on a Wikipedia page. More precisely, we had to analyse what has been edited on the Wikipedia page from the nine-eleven assassination by using the edit history. During our analysis we found out that most of the changes done by users were minor grammar mistakes, correction, addition or updates of links, integrating a different picture or a source from another media. Between those small edits were also more significant ones, like for example the provision of new information or the deletion of redundant or unsuitable text passages. Among noticeable changes were commentaries that had nothing to do with the article, so a kind of vandalism, or the removal of information where this didn’t seem appropriate, as the deleted information still seemed adequate. The interventions by bots were mostly the creation of hyperlinks to a further Wikipedia page or the correction of obvious grammar mistakes. The topics from the other groups were more directly linked to actual internet archives. They mostly worked out problems with the sources of IAs, they gave an insight into the Luxembourg Web Archive, presented the old website of Luxembourg with the help of the “wayback machine” and compared it to the one from nowadays or analysed a YouTube video, the September 11 Digital Archive and the disappearing of the website “Mijn Museum – De Beukel”.
title: Session Summaries by Jo Devaquet abstract: ‘Summary 3’ authors:
- Jo Devaquet-0191486141 date: 2024-10-09 —
[Online Newspaper Archives & Tool 3 ‘Impresso’], [10.10.2024]
The fourth course was divided into three parts. The first part introduced us to the importance of newspapers in historical research and the need to preserve newspaper archives. To summarise, the importance of newspaper archives lies in the fact that they can be used as primary sources in a variety of ways. Newspapers offer a rich and varied spectrum of information about the past. This part not only emphasised the value of newspapers, but also familiarised us with a corpus tool called Impresso. Impresso is a platform that archives Luxembourgish and Swiss newspapers. It improves accessibility for historical research by segmenting the newspapers and surpasses national borders by storing newspapers in different languages from both Luxembourg and Switzerland. The second part allowed us to apply the theoretical knowledge we had acquired. By searching a large corpus for striking elements, you quickly notice patterns and trends. In my group we analysed the frequency / million of the words coal and steel to check how this changed during industrialisation and after the creation of the ECSC. We tried to exclude OCR errors with the topics option, and we also used the snippets to check whether there was a peak in the frequency caused by frequent word use within a limited number of scanned sources. The third part of the course was a very quick exploration of the Github website. We were shown how to create a personal file and where to upload our summaries.
title: Session Summaries by Jo Devaquet abstract: ‘Summary 4’ authors:
- Jo Devaquet-0191486141 date: 2024-10-17 —
[Maps and Tool 4 ‘Story Maps’], [16.10.2024]
The course “Maps and Tool 4 ‘Story Maps’” was divided into two parts. The first part was a brief introduction to the tool of maps. Ms. Schmid explained the role of maps as historical sources, gave us a brief insight into the history of maps, showed the different types of maps that exist (topographical and thematic), and introduced us to the term GIS (Geographic Information Systems). In this part, we learned, for example, that Google Maps was created only in 2007, while the earliest maps date back to at least the 6th century BCE. During the introduction to GIS, we saw how these systems combine different layers of geographical data. The second part consisted of group work. Our group was assigned the “WORLD ATLAS OF TRAVEL INDUSTRY,” created in 1860 by Martin Jan Månsson. It is a massive world map that incorporates extensive information about travel and trade goods. For example, it contains facts about goods, slavery, trading animals, trading habits of various nations, events related to the development of trade, and much more. Additionally, it includes two smaller integrated maps, a diagram, and a list of popular trade goods and their origins. Most of the sources used by the cartographer were written by English and American travelers, industrialists, and researchers of the 19th century. The map itself is a collection of stories told by contemporaries, including European, Russian, and American captains and travelers. The John Snow Map, which the group next to ours worked on, deals with the 1854 Broad Street cholera outbreak. It marks deaths in red and water pumps in blue across different layers. By calculating the area with the highest number of deaths, it was determined that the Broadwick Street water pump was the “hotspot” of the outbreak.
title: Session Summaries by Jo Devaquet abstract: ‘Summary 5’ authors:
- Jo Devaquet-0191486141 date: 2024-10-24 —
[Networks & Tool 5 “Palladio”/ “Vistorian], [23.10.2024]
In the course on 23 October 2024 entitled ‘Networks & Tool 5 “Palladio”/ “Vistorian”’, we were given an introduction to network theory based on a fictitious representation of a social network. Mr During created a wedding scenario to make everything clearer. The individual people represented the nodes. The edges were the existing relationships between the people. These could be more or less intensive and one-sided or in both directions. We were also made aware that social networks can also have a temporal dimension.
Someone who has many connections in the social network has a so-called high degree. A single connection is in turn called a single degree. Then there is also the so-called broker, who represents the bridge between two networks. This gives it an influential position, as it determines what it can transfer from one network to the other. If two nodes are connected by a third element, such as an event, this is referred to as a bipartite network. The type of connection is an affiliation. In the case of a direct connection, we speak of a unipartite network. This time the connection is called interaction. People with many relations find it easier to establish further relations. Networks can create specific dynamics based on their composition.
The term network can be applied to many concepts. We looked at some examples of networks. These included Facebook friendships, a street network and the network of a communist resistance in Cologne. But there are many more that are not related to anything social. The terms diameter, density, degree centrality and their various types were also explained. However, all of this would be too extensive to explain in this summary.
Afterwards, we did a group work in which we were able to create a fictitious network ourselves using Palladio.
title: Session Summaries by Jo Devaquet abstract: ‘Summary 6’ authors:
- Jo Devaquet-0191486141 date: 2024-10-31 —
[Hands on History: EU Parliament Archives], [30.10.2024]
In the lecture from October 30, 2024, titled “Hands on History: EU Parliament Archives,” Ludovic Delepine and Marco Amabilino discussed their work on the historical archives of the European Parliament, which are now available online and include the use of AI. The project started within the archives. The aim was to make millions of documents more visible and accessible. As an introduction, they spoke about Edgar F. Codd, who was one of the first to create relational databases, laying the foundation for modern data systems. They then explained the role of HP and Google, which made breakthroughs in optical character recognition (OCR) engines for various operating systems. The importance of Sparck Jones was also highlighted—she worked on term frequency-inverse document frequency (TF-IDF), a technology that underpins most modern search engines. On the website of the European Parliament’s Historical Archives, users can search for documents by metadata. In the dashboard, they can filter by type, language, and year. There’s also a function called “Ask the EP Archives?” which allows users to ask questions and receive immediate, text-based answers. Their goal was to ensure these answers are grounded in documents from the European Parliament and drastically reducing the risk of AI “hallucinations.” At the end of the presentation, there was a Q&A session. Ludovic Delepine and Marco Amabilino addressed various questions, including on the trustworthiness of AI in archival practices, emphasizing that it will always require critical evaluation by the user, whether they are a jurist, political scientist, or historian. It was also clarified that fake news also circulates completely independently of AI and that one should therefore not place too much blame on the role of AI. In addition, IT-specific questions were answered that were asked by professors or post-doc researchers.
title: Session Summaries by Jo Devaquet abstract: ‘Summary 7’ authors:
- Jo Devaquet-0191486141 date: 2024-11-07 —
[DH Theory: Criticisms; Transparency; Reproducibility/Documentation], [07.11.2024]
The course from 6 November 2024, titled “DH Theory: Criticisms; Transparency; Reproducibility/Documentation” was largely a revision of the article by Hoekstra and Koolen. In class, we addressed critical questions in historical and data-driven research, with a particular focus on evaluating truth, interpretation, and forgery in historical scholarship. One example used was Holocaust denial by David Irving and the court case with Deborah Lipstadt. Lipstadt’s book Denying the Holocaust critiques Irving as a prominent Holocaust denier who manipulates historical evidence to fit his ideological agenda. The ensuing trial highlighted Irving’s misuse of sources, with historians like Richard Evans systematically proving forgery by analyzing archival records, speeches, and publications. For data-driven research, the article by Hoekstra and Koolen outlines five key points to ensure methodological transparency and replicability. The first point is ‘Selection’. This involves carefully choosing data, such as actors, time periods, relationships, and sources. In the course exercise, we had to decide whom to include or exclude (e.g., including or excluding interactions with wedding staff) in our network. The second point is ‘Modeling’, which is about structuring data conceptually. In the course example, this included defining what constitutes nodes and edges, tracking attributes, and incorporating temporal aspects. The third point is ‘Normalization’, which involves standardizing data for consistency. In the course example, this referred to standardizing name spellings, date formats, and relationship types, ensuring comparability across datasets. The fourth point is ‘Linking’, which is about establishing connections between different data elements, such as merging networks from various sources or cross-referencing events across datasets. The final point is Classification, which involves grouping data meaningfully. In the course, this included categorizing relationships by type (e.g., personal or professional) or assigning weights to network edges. The goal is to document each step transparently, allowing others to replicate the personal analysis. Adopting the FAIR principles – ensuring that data is findable, accessible, interoperable, and reusable – supports broader goals of openness and credibility in data-driven research.