Session Summaries by Sabrina Morais
Data Metadata & Tool 2 ‘Tropy’, 25.09.2024
Before the course on “Data and metadata”, we also had to watch a video and read a text so that we could get an impression of the topic and gain a basic understanding. In the lecture we briefly saw what digital history is. Data are facts or pieces of information, and they come in many different types. So data are digital things that we collect, store, manipulate, examine, interpret, preserve and create/produce. We also learned what research data is, which are factual records used as primary sources for scientific research. Metadata are information about information, and is used to find, organize, identify, contextualise data and to evaluate an information. It allows us to filter objects by characteristics, tags and types. Metadata allows the researcher to organize the data sets. Through the examples and videos, it was easier to understand metadata and data. Whenever a term was explained, there were examples to help us visualise it, which was ideal. I also didn’t realise that we also produce data or metadata in our everyday lives, for example by posting a tweet. So text, hashtag, time, date, etc. would be the metadata. Historians/researchers use data/metadate because they improve accessibility, organisation/classification, quality and accuracy/reliability of research. Unfortunately, the introduction via Tropy was very short and rapid. But I understood that Tropy allows us to organize, comment and describe photographs of research material, but it does not keep track of references (not organizing citation) and create bibliographies, alter photo files with advanced tools or publish the photo files online.
Web Archives’, 02.10.2024
In the course on web archives we got new impressions of different web archives, including how they work, what their problems might be and how useful they can be for historians/researchers. There were 7 different assignments. For each assignment we had to set up groups of 3-4 students and had 30 minutes to answer the different questions and divide the group work. At the end, each group had to present their results to the class. I worked with two other students on “Fluidity of the web” on Ranke.2, respectively on the historical narratives in constant motion, thus on a Wikipedia entry about September 11 attack. At the beginning, we worked individually and then presented our findings to each other. We had to analyse the revision history for editing changes. There were more grammar/spelling corrections, adding images/text/paragraphs, removing tags, correcting links and restructuring the text in the article. In this editing history, some bots were also found and make rapid repetitive semantic or semi-automatic edits, correcting spelling errors, removing vandalism and fixing formatting errors. Bots can’t recognize when databases include incomplete information or mistakes. Bots are created by humans, which means they depend on the assistance of humans, so the bot can adopt biases from the editors. Bots don’t focus on contextualisation, critical thinking or on ethical consideration. The collaboration between humans and bots changes how history is captured, produced and understood, so questions of reliability, objectivity and authenticity can arise. Historical narratives on Wikipedia are in constant flux. For example, the way historians analyse and write about the past can change over time, there are new discoveries, new historical research about a certain topic and different interpretations and perspectives. Wikipedia is publicly accessible, which means that anyone can edit the article, so incorrect information or even vandalism can occur. The aim of this exercise was to understand how collaborative technologies influence the moving nature of web content.
Online Newspaper Archives & Tool 3 ‘Impresso’, 09.10.2024
Before the course about “Impresso”, we had to read an article to get an insight into the functions, the design process, the creation, the capabilities of the user interface and ideas behind Impresso. The first Impresso-project was a Swiss-Luxembourgish newspaper collection that was digitised and used for historical research. Impresso has various filter, discovery and search functions. This digital historical newspaper archive should enable researchers/historians to analyse, consult, use and interpret digitised newspapers in order to gain knowledge about the past. Historians play a major role in the creation and development of the Impresso project as they explain how the project can be improved. In a quick walkthrough, we were also shown the various functions such as Ngram, Filter, Inspect & Compare, Newspapers and Text Reuse. In order to avoid OCR errors, Impresso works with “word embedding”, which means that words are taken that sound similar or different languages are taken into account and the different spelling variants are taken into consideration. The second Impresso-project integrates and enriches newspaper and radio sources in a single semantic space. We tried Inspect & Compare and used for query A “NATO-Doppelbeschluss” (99 results) and for query B “Helmut Schmidt” (6.190 results) and came to 24 common results. When we tried the newspaper filter and clicked on five different newspapers, no results came up for query A. When we experimented with the different topics in Query B, there were no common results. What was interesting was that there were no common results in 1979 between the two queries, as Helmut Schmidt played an important role in the implementation of the NATO Double-Track Decision. One also wonders what political orientation the newspapers had, as it influences how news is presented and interpreted.
Maps & Tool 4 ‘StoryMaps’, 16.10.2024
A critical attitude should always be taken towards maps, because the creation of a map is subjective The W-questions and the how-question should always be in our minds. My group chose the “World Atlas of travel industry, 1860” by Martin Jan Mansson, an enormous historical map of travel and trade routes in the 19th century, created mainly from the records of American and English travellers, explorers and industrialists. The perspective was therefore mainly Western and European. There was no mention in the map that Mansson used African, Arab or Asian journals, so their perspective of the trade and travel route map remains unknown. Was this intentional or unintentional? Because the trade route was also linked to the colonial period, in which Americans and Europeans were very active and you can also see in the “world map of selected goods” from where trade goods came and were exchanged between different continents. We mainly used the “text and image” option to create StoryMaps. The “John Snow Maps” also used images and text, but it looks more interactive because you can see how the maps changes by using different couches options. Snow’s map was used on a GIS-model, which supports the spatial understanding and has tremendous storytelling power, but can also be easily manipulated. Maps always depend on how the cartographer/geographer decides to create it and what his subject, interest and purpose is. His work is subjective and can be economically, politically, culturally, socially and environmentally influenced. A map can also be overwhelming if you find a lot of data in a map, as with such Mansson’s map. Snow’s map is clearer and not as cluttered, respectively the GIS-model. This work has taught me how space can be characterised by different priorities of the creator and how this can be historically related (colonialism, trade networks, spatial distribution of cholera diseases).
Networks & Tool’Palladio’, 23.10.24
The professors used a wedding invitation to show us how networks function and how relationships, nodes, are created. This explanation was helpful to understand how networks work and how we create them, and we have to think about many different aspects and attributes. The professor also thought us the meaning of a “reciprocity”, “network boundary”, “broker”(powerful network, control on spread information/rumour) , “affiliations” and “interactions” (directed information to one node to another node). In the course we also got a definition from James Clyde Mitchell and that networks are very flexible, have different types of ties and that social network analysis (SNA) helps us to collect, analyse, store and visualise data. We also learned how historians can use social network analysis by learning about the interactions and relationships of historical figures, which Professor During did. He used SNA to understand how Jews and helpers of Jews were connected during the Nazi regime through support networks for persecuted Jews. SNA can be used as a historical source. What I found quite difficult to understand was the concept of degree/betweennes/closeness centrality and bipartite/unipartite graphs. What I found quite difficult to understand was the concept of degree/centrality and bipartite/unipartite graphs. By reading “From Hermeneutics to Data to Networks (…)” we also understood how Palladio works. We also had to create our own wedding invitation on an excelsheet and use this data to experiment with Palladio and its different options such as graph, filters, table, image and map. The attributes were gender, age, and interests of the guests, and the relations were their sympathy and relationships.
Hands on history: EU Parliament Archives, 30.10.2024
Ludovic Délépine and Marco Amabilino presented “Bringing the EP Archives to a wider audience” on 30th October 2024, where they discussed their efforts to make the EP Archives more accessible and visible. Initially, the documents were only accessible on-site or via email. They started by creating a simple concept with visualisation and simple search capabilities, but soon realized that metadata search was not enough. They began extracting text from images using optical character recognition engines and deep learning mechanisms. Then they tried to improve the search capabilities by similarity search to find other documents that have a similar corpus. The contributions/role of Edgar F. Codd, Google, Mikolov, and ChatGPT were mentioned, along with Sparck Jones’ role in the development of “term frequency-inverse document frequency” for finding relevant words in a document and their frequency in documents. The EU Parliament adopted this method, and the AI-based deep learning model called “Ask the EP Archive”. This allows users to ask questions in different languages and receive answers in the respective language and give the user the documents based on the material from the archive to avoid hallucinations, distortions and errors. The AI is controlled by “constitutional AI” to promote trustworthiness, reduce harm and provide useful answers. During the discussion limitations were addressed such as AI’s concerns about archival practices, challenges with languages such as Romanian and if it has a problem, it will switch back to English, AI hallucinations, as it is always possible to get incorrect and misleading answers, and the Archives’ document restrictions (from 1952-1994) due to retention periods (30 years). They also said that AI should be seen as a “bridge to understand history”, and the system was not developed to reproduce the context of a document or to replace archivists. There were also IT questions, but since I don’t operate in the IT industry, it was a bit difficult to understand or relate to them. ==> I have sent you my review of the dashboard and my prepared questions by email.
DH Theory – criticism, transparency, reproducibilitiy/documentation, 06.11.2024
Our course “DH Theory – criticism, transparency, reproducibilitiy/documentation” started with the question “How do we know that is true?”, respectively, there is always a danger of falsification or distortion of events in history. This was the case of David Irving, who twisted and exaggerated a narrative of Hitler and the Holocaust. He said there was no connection between Hitler and the Holocaust or denied the Holocaust. Eventually, other historians complained about Irving, such as Deboard Lipstadt, and Irving took them to court, where they had to prove whether the accusations were true. Irving’s own works were used to prove his distortions and falsifications. This was used to explain how we can make our network (data-driven research) trustworthy. We had to use the different methods of ‘data scopes’ selection (choosing data), modelling (structure/representation of data), normalisation (standrarizing data), linking (connecting data between different data/sources) and classification (grouping data into meaningful categories) so that others could reproduce the data of our imaginary wedding invitation. In our case, this was difficult because all the relationships and attributes were invented. This made it difficult to define and classify the data and there is always a risk of mismatching data. It is rather easier if you have a personal experience with the data or have studied it in detail and can therefore identify the attributes and relationships more easily. Last but not least came the FAIR principles, which meant that data should be findable, accessible (copyright), interopable, thus connecting data with other existing datasets, and the data should also be reusable.