Session Summaries by Alex Diedenhofen
Data, Metadata and Tropy, 25/09/2024
In a first step, the course dealt with the terms of data and metadata and presented their differences and their use. I think the video shown in the course made it possible to quickly and easily visualise the differences and thus understand the basic information for the later course.
Data therefore refers to ‘facts’ or ‘pieces of information’. The organisation of different data can therefore be referred to as a dataset. Metadata, on the other hand, is simply data that describes other data. It provides information about the information. An example of this is if you have 100 different newspaper articles and organise them according to, for example, the language in which the article was written, the period of publication or the name of the newspaper. Another example of data is research data. These are factual records that are used for the validation of research findings. Examples of such factual records vary depending on the research. They can be numerical values, text records or images and sounds.
In a second step, the Tropy software was presented. I can’t say much about the Tropy software for this course, only that it is a software that allows you to organise different photos of research materials. In the course, Tropy was only covered for the last 10 minutes in fast forward, which meant that I personally did not yet fully understand how to use Tropy.
In future courses where new tools are introduced, more time should be spent on explaining the tool to the students instead of having a purely theoretical course and not having enough time at the end to explain the software in more detail.
Web Archives, 02/10/2024
The course, held on 2 October addressed the subject of web archives. To work through the different themes of this topic, we were divided into six groups, each tasked with a specific exercise related to web archives. These groups then presented their findings to the class.
While the other groups dealt with topics such as the stakes of archiving the web, the Luxembourg Web Archive, Archiving luxembourg.lu, fluidity of the web or crowdsourced born digital archives, my group dealt with the topic of family and personal archives in the web.
In my group’s assignment, we looked at a web archive created by a private individual, Roy Simons, which was about his grandfather. He created this web archive using the free website service Webklik. However, such privately created websites have the problem that they can disappear over time, for example due to copyright problems. After some time, his web archives were taken over by a web hosting service company Weebly, which also took the content with it. The content that Roy Simons had put on the web was all deleted and the new site that was created is no longer a personal family archive but a collection on a central theme where no people or relatives are named.
In a second step, we searched for ourselves on the internet and checked whether we were archived on the web. We didn’t find much about ourselves, but we did notice some photos of us on the web that we didn’t know about, mainly from magazines.
I personally liked the way this course was organised. Instead of a purely theoretical course, you were able to discuss a topic in small groups and the experimentation with the Wayback Machine was also entertaining. In addition, the lecturer’s intervention with additional information after each presentation was clear and easier to absorb than during a purely theoretical lesson.
Online Newspaper Archives and Impresso, 09/10/2024
During the course on 9 October, we worked with the Impresso tool. Impresso is a project that was initially a collaboration between Switzerland and Luxembourg, but today several other partners from other countries are also involved in the project. The first step was to introduce us to Impresso in general and its functions. The aim of Impresso is to facilitate research by means of digitised journals from over 70 journals. This allows you to find, analyse, interpret or compare different data. These possibilities are created by a keyword search engine in which you can try out many different filters. OCR-mistakes are also taken into account and can be selected as synonyms.
In a second step, we were divided into different groups and asked to research a topic with a function from Impresso. As group 7, we used the Ngrma function of Impresso to search for our home villages, Lintgen and Dudelange. The Ngram function is a search function that indicates how often a certain term appears annually in the magazines integrated in the project. In our results, we immediately realised that Dudelange was not to be found in any magazine from 1941-1944. As a result, we entered the German spelling, Düdelingen, and recognised a high incidence of this spelling. This concerning the 2nd World War and the German invasion of Luxembourg. From 1950 onwards, one generally finds few hit points for cities in periodicals, due to copyright problems.
At the end of the course, we were shown how to upload our weekly summaries to the Github platform using a coding language called Markdown.
Story Maps, 16/10/2024
My group worked with the John Snow Map. This map is a good example to show the functionality of different layers of GIS. The approach to creating our story map started by briefly introducing the creator, John Snow. In a second step, we presented the map he had created and added the different layers step by step, each of which brought new information. For each layer of the map, we asked ourselves why the new layer was important, what information it conveyed and, depending on whether symbols were used, what they represented.
Another group created a story map on the topic of ‘Preserving Society Hill’. The structure of their story map is to start with a locator map to show that Society Hill is a neighbourhood in the city of Philadelphia, followed by a historical map layered on top of a current map. In the next step, they answered a few questions in bullet point about what the map represents, who created it, where the data on the map comes from and what story can be told using the map. From this project you can learn that you can look at several new and old histories from just one historical map. You can follow the development of the town over time and trace the context and history of the houses and their inhabitants. On the historical map you can see that the vast majority of the buildings were built close together, are of small dimensions and are geometrically located in individual squares. In individual squares, however, larger houses and buildings can also be recognised, which are located at a certain distance from the smaller ones. It is therefore possible to see a social demarcation between buildings that were probably built by people from the lower or working class, whereas the larger buildings may have belonged to the upper class. However, judging by their size, these buildings could also be public buildings or company buildings.
Data, Networks and Palladio, 23/10/2024
Due to the fact that I was unable to attend the session on Palladio and the creation of networks due to illness, I will base this summary of the session on my own experience of using Palladio.
Before I started looking at the Palladio software, I read the article ‘From Hermeneutics to Data to Networks: Data Extraction and Network Visualisation of Historical Sources’ to give me an overview of the software and its capabilities. With initial difficulties with Palladio, this article and the integrated step-by-step tutorial helped me to better understand and handle the use of Palladio.
My first idea for a Palladio network was actually to use films, actors and their role in a particular film, i.e. leading role, supporting role or cameo appearance. As it turned out, however, this was more difficult than expected and so I decided to organise my network based on the closest family circle.
Creating different tables with nodes, relationships and attributes was the biggest challenge for me. It is not clear to me whether this was due to my lack of course participation or my insufficient knowledge of creating such tables. However, after some experimentation, I managed to create an Excel sheet that had some coherence. After a while, using Palladio seemed relatively easy. The ability to experiment with different filters and facets and thus capture different perspectives on the construction of the network was easy to understand and interesting to follow. Furthermore, the Palladio Calculator is a promising addition for recognising the different degrees of node importance and the betweenness centrality of the nodes with precise data.
EU Parliament Archives, 30/10/2024
Review of the dashboard:
The digital archive of the European Parliament provides public access to documents from 1952 to 1994. The archive dashboard makes it possible to search for relevant documents by organising historical records with specific filters. The first thing I noticed while browsing the European Parliament’s digital archive was that no search queries are saved anywhere, or you can’t go back to previous searches. This is probably due to the anonymity of the platform. With regard to the search function, however, it should be noted that you are offered a variety of possible filters to find a specific group of documents. The search is therefore greatly simplified by implementing search functions according to language, date, type of document or to which parliamentary body the document belongs. One function that I find very useful is the implementation of a kind of chatbot called ‘Ask the EP Archives’ which answers questions based on the documents available in the digital archive. For another course, for example, I was able to use this function to obtain archive documents concerning my topic of Euroscepticism. As you can read in my question 5 below, I have to be honest and say that I really don’t understand how to use the “content-analysis” dashboard. I would like to see either a clearer interface or a step-by-step explanation on how to use the digital archive instead of just a video explaining how to use this dashboard. In addition to the implementation in the following years of further years of documents which may not yet be published due to the 30-year regulation, it would certainly also be useful in the future to be able to search for picture sources or audio sources in this digital archive.
Summary of the lecture and discussion:
The lecture was given by Ludovic Delepine and Marco Amabilino, from the Archives of the European Parliament. This archive offers the possibility to view documents since 1952 and to make them accessible to the general public in a simplified way through a digital archive. However, it must be mentioned that the available documents only go back to 1994, due to the data protection regulation that documents may only be published after 30 years. One interesting point was their explanation of how they can export text from scanned documents. An AI is used here which can export these texts using deep learning. In addition, the digital archive makes it possible to offer several similar thematic documents based on tokens in a document. Nevertheless, it should be noted that the introduction to the presentation was perceived as interesting. However, as the presentation progressed, the thread was increasingly lost, which can be attributed to the extensive use of technical terms and the limited prior knowledge of the subject area. The few questions that were not asked by us students were also very specific questions, the content of which I did not fully grasp. As a result, it was too difficult and too incomprehensible for me to understand what the final discussion was about.
Prepared questions:
- While reading the article “Archives and AI”, the Records Continuum model was presented. Can you explain this as its function was not clear to me.
- What future prospects do you see for AI in archive work, and in which areas do you see the greatest benefits or challenges?
- Are there any particular difficulties or advantages that should be considered when processing historical archives from different eras, such as medieval or early modern documents compared to those from the 20th century (European Holocaust Research Infrastructure – EHRI)?
- Could it be a disadvantage for future generations if they work primarily with AI and immerse themselves less in historical research and source criticism?
- I am having problems with the “content-analysis” dashboard. I don’t quite understand how the intertopic distance visualization works and what function it has to provide me with information.
DH Theory: Criticisms; Transparency; Reproducibility/Documentation, 06/11/2024
This week’s course was about DH Theory. At the beginning, the question ‘How do we know what is true?’ was asked. This was based on the example of David Irving, who was found to have deliberately spread fake news in his works and also denied the Holocaust. Such an example raises the question of how it is possible to prove that a historian deliberately fakes information and news.
We are able to prove bad historical scholarship based on written records. There is also a method that can be used to prove this for data-driven research. This method makes it possible to track data-driven research and thus prove whether it is correct. This method consists of 5 steps.
Selection is the process of choosing which data to include or exclude from your analysis. Modeling refers to how you structure and represent your data conceptually. Normalisation aims to standardize and represent the data conceptually. Linking aims to establish connections between different data elements or sources. And lastly, Classification groups data into meaningful categories.
Based on these 5 steps, we should then document the network we created a few weeks ago with the aim of enabling someone else to obtain the same data later on.
In a further step, there is also the term FAIR Data. This is a concept that states that research data must be easily findable, accessible, interoperable and reusable. The data must therefore not disappear (F), must be accessible (A), it must be possible to connect the data with another data set (I) and the data must generally be reusable for interconnectivity with other data sets (R).
Scalable Reading and Voyant Tools, 04/12/2024
As part of course on distant reading, our assignment was exercise number 3. This exercise involved transferring the comments from a YouTube video titles “Luxembourg: Poverty in Europe’s wealthiest country” into the Voyant Tools interface. The work was divided into three main parts, each exploring different features and possibilities offered by this tool. The first part of the exercise required us to explore three specific features of the interface: a word-cirrus, the list of terms, and links. First, the word-cirrus tool projects the most common words as a word cloud, with size and colour varying based on frequency. We could also adjust the number of words displayed to better visualize trends. Second, using the list of terms, it was possible to count how many times a specific word appeared in the comments, with positive terms, like “rich”, highlighted in green and negative terms, like “poor”, in red. Third, the links feature highlighted a network of the most frequent words, distinguishing primary terms in blue and their associated terms in orange, with the same “adjustment” feature possible. For instance, the word “people” (in blue) was linked to terms like “government” or “Luxembourg” (in orange). These features allowed us to visualize the vague opinions expressed in the comments and understand the general direction of the video’s main subject. In the second part, we explored two other features of Voyant Tools: contexts and collocations. First, the context tool displayed the sentences or passages where a specific was used, enabling us for example, to distinguish positive comments from negative ones and analyse options in more detail. Second, the collocation tool showed how often certain words appeared together in the same context. For example, the words “poverty” and “Europe” appeared together nine times in the comments, highlighting a significant connection between these two words. These analyses helped us understand the frequent and less-frequent relationships between terms and identify recurring themes in the comments. Finally, in the third part, we reflected on potential applications of this interface in other domains. We proposed three ideas. First, for archival research, this interface could be used to analyse historical archives. For instance, one could identify how often a specific subject, such as the Luxemburgish communist diplomat “René Blum”, is mentioned in a corpus of archives and examine the contexts in which it is referenced, like “communism” or the “Soviet Union”. Second, Voyant Tools could be employed to analyse newspaper articles, particularly the most used words used by a specific journalist or author in a publication, to conduct critical discourse analysis. Moreover, this critical discourse analysis can also be applied to topics like World War II, the Cold War, or the Crusades, enabling the user to analyse how opposed parties describes such events. Third, this tool could track the evolution of an author’s work overtime. For instance, it would be interesting to analyse how a writer’s style or themes, that lived in Germany, changed before, during, and after World War II. This exercise thus allowed us to explore the many capabilities of Voyant Tools in text analysis and to consider its practical applications in various contexts.
Dissemination (Part I) & Case Studies ‘Minett Stories’, ‘Historesch Gesinn’, 11/12/2024
Joëlla van Donkersgoed led the course session, which focussed on the concept of public history. Public history is an initiative and opportunity to make history accessible to the general public. The session began with an introduction of the guest presenter and an outline of her academic career and experience in the field of public history.
As part of the presentation of her projects, the guest speaker presented the ‘Historesch/Esch’ project, among others. The residents of the region were invited to actively participate in the creation of a mural that depicts the history of the region in a personal way. Participants were invited to contribute their own stories and photos to complete the overall picture of the mural.
Another project is ‘HistoreschGesinn’. This is an online platform that offers the opportunity to familiarise yourself with individual public history projects or even to contribute to expanding these projects by submitting own ideas by yourself.
The third project presented is the ‘Esch in 25 Objects’ project. This is a project that functions as an opportunity for residents to publish their stories or views on the history of the town. The project management organised meetings with residents in cafes to provide them with a trustworthy environment to share their stories comfortably. Finally, she introduced us to a project led by Véronique Faber, a project about the Schueberfouer, and their transnational history.
In a final step, we as a class tried to create our own concept of a public history by thinking about 5 key concepts, these were Who, Where, Conception, Collection and Execution.
Dissemination (Part II) & Case Studies ‘Journal of Digital History’; data papers, 18/12/2024
The topic of the last session was titled “Dissemination of Scientific Results II”, and the focus of the course was on the subject of scientific publishing.
Scientific publishing is defined as “the subfield of publishing which distributes academic research and scholarship”. The aims of scholarly publishing are varied, including the objectives of “making visible given research” and distinguishing researchers. However, for a scholarly publication to be considered scientific, it must meet specific criteria, the content must be reliable and reusable.
Another key concept is that of the “peer review”. This is a process in which a paper written by an individual is read and endorsed by another specialist in the field. In the context of scholarly publications, this process is of immense importance, as it serves as a testament to the paper’s value and its recognition within academic circles.
Scientific publishing, as a subdomain of the publishing industry, is characterised by its own unique set of challenges. The pressure to publish research consistently, the monopolisation of the market, and the accessibility of papers are some of the issues that individuals face in this domain. The larger publishers have the capacity to determine subscription formats, which has the effect of creating greater inequality and significantly restricting accessibility. This has led to the emergence of the Open Access movement, which has the objective of providing free, full access, as well as the possibility to download papers.
As a last step, we examined in a group the structure and other important aspects of the article “Dialects of Discord. Changing vocabularies in the Dutch Cruise Missile discussion.”