Session Summaries by Alex Diedenhofen

written by: — October 11, 2024

Data, Metadata and Tropy, 25/09/2024

In a first step, the course dealt with the terms of data and metadata and presented their differences and their use. I think the video shown in the course made it possible to quickly and easily visualise the differences and thus understand the basic information for the later course.

Data therefore refers to ‘facts’ or ‘pieces of information’. The organisation of different data can therefore be referred to as a dataset. Metadata, on the other hand, is simply data that describes other data. It provides information about the information. An example of this is if you have 100 different newspaper articles and organise them according to, for example, the language in which the article was written, the period of publication or the name of the newspaper. Another example of data is research data. These are factual records that are used for the validation of research findings. Examples of such factual records vary depending on the research. They can be numerical values, text records or images and sounds.

In a second step, the Tropy software was presented. I can’t say much about the Tropy software for this course, only that it is a software that allows you to organise different photos of research materials. In the course, Tropy was only covered for the last 10 minutes in fast forward, which meant that I personally did not yet fully understand how to use Tropy.

In future courses where new tools are introduced, more time should be spent on explaining the tool to the students instead of having a purely theoretical course and not having enough time at the end to explain the software in more detail.

Web Archives, 02/10/2024

The course, held on 2 October addressed the subject of web archives. To work through the different themes of this topic, we were divided into six groups, each tasked with a specific exercise related to web archives. These groups then presented their findings to the class.

While the other groups dealt with topics such as the stakes of archiving the web, the Luxembourg Web Archive, Archiving luxembourg.lu, fluidity of the web or crowdsourced born digital archives, my group dealt with the topic of family and personal archives in the web.

In my group’s assignment, we looked at a web archive created by a private individual, Roy Simons, which was about his grandfather. He created this web archive using the free website service Webklik. However, such privately created websites have the problem that they can disappear over time, for example due to copyright problems. After some time, his web archives were taken over by a web hosting service company Weebly, which also took the content with it. The content that Roy Simons had put on the web was all deleted and the new site that was created is no longer a personal family archive but a collection on a central theme where no people or relatives are named.

In a second step, we searched for ourselves on the internet and checked whether we were archived on the web. We didn’t find much about ourselves, but we did notice some photos of us on the web that we didn’t know about, mainly from magazines.

I personally liked the way this course was organised. Instead of a purely theoretical course, you were able to discuss a topic in small groups and the experimentation with the Wayback Machine was also entertaining. In addition, the lecturer’s intervention with additional information after each presentation was clear and easier to absorb than during a purely theoretical lesson.

Online Newspaper Archives and Impresso, 09/10/2024

During the course on 9 October, we worked with the Impresso tool. Impresso is a project that was initially a collaboration between Switzerland and Luxembourg, but today several other partners from other countries are also involved in the project. The first step was to introduce us to Impresso in general and its functions. The aim of Impresso is to facilitate research by means of digitised journals from over 70 journals. This allows you to find, analyse, interpret or compare different data. These possibilities are created by a keyword search engine in which you can try out many different filters. OCR-mistakes are also taken into account and can be selected as synonyms.

In a second step, we were divided into different groups and asked to research a topic with a function from Impresso. As group 7, we used the Ngrma function of Impresso to search for our home villages, Lintgen and Dudelange. The Ngram function is a search function that indicates how often a certain term appears annually in the magazines integrated in the project. In our results, we immediately realised that Dudelange was not to be found in any magazine from 1941-1944. As a result, we entered the German spelling, Düdelingen, and recognised a high incidence of this spelling. This concerning the 2nd World War and the German invasion of Luxembourg. From 1950 onwards, one generally finds few hit points for cities in periodicals, due to copyright problems.

At the end of the course, we were shown how to upload our weekly summaries to the Github platform using a coding language called Markdown.

Story Maps, 16/10/2024

My group worked with the John Snow Map. This map is a good example to show the functionality of different layers of GIS. The approach to creating our story map started by briefly introducing the creator, John Snow. In a second step, we presented the map he had created and added the different layers step by step, each of which brought new information. For each layer of the map, we asked ourselves why the new layer was important, what information it conveyed and, depending on whether symbols were used, what they represented.

Another group created a story map on the topic of ‘Preserving Society Hill’. The structure of their story map is to start with a locator map to show that Society Hill is a neighbourhood in the city of Philadelphia, followed by a historical map layered on top of a current map. In the next step, they answered a few questions in bullet point about what the map represents, who created it, where the data on the map comes from and what story can be told using the map. From this project you can learn that you can look at several new and old histories from just one historical map. You can follow the development of the town over time and trace the context and history of the houses and their inhabitants. On the historical map you can see that the vast majority of the buildings were built close together, are of small dimensions and are geometrically located in individual squares. In individual squares, however, larger houses and buildings can also be recognised, which are located at a certain distance from the smaller ones. It is therefore possible to see a social demarcation between buildings that were probably built by people from the lower or working class, whereas the larger buildings may have belonged to the upper class. However, judging by their size, these buildings could also be public buildings or company buildings.

Data, Networks and Palladio, 23/10/2024

Due to the fact that I was unable to attend the session on Palladio and the creation of networks due to illness, I will base this summary of the session on my own experience of using Palladio.

Before I started looking at the Palladio software, I read the article ‘From Hermeneutics to Data to Networks: Data Extraction and Network Visualisation of Historical Sources’ to give me an overview of the software and its capabilities. With initial difficulties with Palladio, this article and the integrated step-by-step tutorial helped me to better understand and handle the use of Palladio.

My first idea for a Palladio network was actually to use films, actors and their role in a particular film, i.e. leading role, supporting role or cameo appearance. As it turned out, however, this was more difficult than expected and so I decided to organise my network based on the closest family circle.

Creating different tables with nodes, relationships and attributes was the biggest challenge for me. It is not clear to me whether this was due to my lack of course participation or my insufficient knowledge of creating such tables. However, after some experimentation, I managed to create an Excel sheet that had some coherence. After a while, using Palladio seemed relatively easy. The ability to experiment with different filters and facets and thus capture different perspectives on the construction of the network was easy to understand and interesting to follow. Furthermore, the Palladio Calculator is a promising addition for recognising the different degrees of node importance and the betweenness centrality of the nodes with precise data.

EU Parliament Archives, 30/10/2024

Review of the dashboard:

The digital archive of the European Parliament provides public access to documents from 1952 to 1994. The archive dashboard makes it possible to search for relevant documents by organising historical records with specific filters. The first thing I noticed while browsing the European Parliament’s digital archive was that no search queries are saved anywhere, or you can’t go back to previous searches. This is probably due to the anonymity of the platform. With regard to the search function, however, it should be noted that you are offered a variety of possible filters to find a specific group of documents. The search is therefore greatly simplified by implementing search functions according to language, date, type of document or to which parliamentary body the document belongs. One function that I find very useful is the implementation of a kind of chatbot called ‘Ask the EP Archives’ which answers questions based on the documents available in the digital archive. For another course, for example, I was able to use this function to obtain archive documents concerning my topic of Euroscepticism. As you can read in my question 5 below, I have to be honest and say that I really don’t understand how to use the “content-analysis” dashboard. I would like to see either a clearer interface or a step-by-step explanation on how to use the digital archive instead of just a video explaining how to use this dashboard. In addition to the implementation in the following years of further years of documents which may not yet be published due to the 30-year regulation, it would certainly also be useful in the future to be able to search for picture sources or audio sources in this digital archive.

Summary of the lecture and discussion:

The lecture was given by Ludovic Delepine and Marco Amabilino, from the Archives of the European Parliament. This archive offers the possibility to view documents since 1952 and to make them accessible to the general public in a simplified way through a digital archive. However, it must be mentioned that the available documents only go back to 1994, due to the data protection regulation that documents may only be published after 30 years. One interesting point was their explanation of how they can export text from scanned documents. An AI is used here which can export these texts using deep learning. In addition, the digital archive makes it possible to offer several similar thematic documents based on tokens in a document. Nevertheless, it should be noted that the introduction to the presentation was perceived as interesting. However, as the presentation progressed, the thread was increasingly lost, which can be attributed to the extensive use of technical terms and the limited prior knowledge of the subject area. The few questions that were not asked by us students were also very specific questions, the content of which I did not fully grasp. As a result, it was too difficult and too incomprehensible for me to understand what the final discussion was about.

Prepared questions:

  1. While reading the article “Archives and AI”, the Records Continuum model was presented. Can you explain this as its function was not clear to me.
  2. What future prospects do you see for AI in archive work, and in which areas do you see the greatest benefits or challenges?
  3. Are there any particular difficulties or advantages that should be considered when processing historical archives from different eras, such as medieval or early modern documents compared to those from the 20th century (European Holocaust Research Infrastructure – EHRI)?
  4. Could it be a disadvantage for future generations if they work primarily with AI and immerse themselves less in historical research and source criticism?
  5. I am having problems with the “content-analysis” dashboard. I don’t quite understand how the intertopic distance visualization works and what function it has to provide me with information.

DH Theory: Criticisms; Transparency; Reproducibility/Documentation, 06/11/2024

This week’s course was about DH Theory. At the beginning, the question ‘How do we know what is true?’ was asked. This was based on the example of David Irving, who was found to have deliberately spread fake news in his works and also denied the Holocaust. Such an example raises the question of how it is possible to prove that a historian deliberately fakes information and news.

We are able to prove bad historical scholarship based on written records. There is also a method that can be used to prove this for data-driven research. This method makes it possible to track data-driven research and thus prove whether it is correct. This method consists of 5 steps.

Selection is the process of choosing which data to include or exclude from your analysis. Modeling refers to how you structure and represent your data conceptually. Normalisation aims to standardize and represent the data conceptually. Linking aims to establish connections between different data elements or sources. And lastly, Classification groups data into meaningful categories.

Based on these 5 steps, we should then document the network we created a few weeks ago with the aim of enabling someone else to obtain the same data later on.

In a further step, there is also the term FAIR Data. This is a concept that states that research data must be easily findable, accessible, interoperable and reusable. The data must therefore not disappear (F), must be accessible (A), it must be possible to connect the data with another data set (I) and the data must generally be reusable for interconnectivity with other data sets (R).