Session Summaries by Emilie Neves
Summary Session 2 “Form Sources to Data”, 25 Septembre 2024
During this seminar, we talked about data and metadata, research data and got a brief introduction to the Tropy software. In today’s digital age historians, all have to use digital resources at some point in their work, but that doesn’t make every Historian a Digital History Historian. Digital History is historical research based on primary sources accessible as electronic data. It can be both digitised sources and born-digital sources. Data is what can serve as the basis for an analysis, research, data is information. Data can be digital elements that we create, collect, store, manipulate, analyse, interpret, preserve and make available. There are different research phases. First, we search, localise and gather research materials (Heuristics), then these need to be analysed and interpreted. Then organised and published. The materials also need to be preserved (archived). Research data are descriptive elements of reality. Qualitative data is gained by observation and quantitative data by measurements. Research data is collected, structured and organise so that we can obtain information and gain knowledge. Metadata is information about the data. Metadata can be very useful, to find, identify and organise data, evaluate information and provide context. We can use systems to generate metadata. There are two types of metadata, embedded (which is automatically generated by a machine) and enriched (added by a creator or user). Tropy can be used to organise and archived digital pictures all in one place, in a project and add metadata to them for a better organisation and identification of the data.
Summary Session 3 “Web archives”, 02 October 2024
In this course we saw multiple web archives. For this session we were divided in small groups, who each had an assignment that they later had to present to the others. The first assignment focused on the challenges of archiving the web, based on a text by Chris Stokel-Walker. The Internet, archive; a non-profit that uses bots make copies of websites to preserve websites for future research. However, issues like copyright and the impossibility to archive everything remain problematic. The second assignment looked at the Luxembourg Web Archive managed by the BNL. The BNL goal is to preserve all Luxembourg-related websites but cannot archive all, they must prioritize some. This archive is only accessible within the BNL’s network. The website takes 4 screen shots from the concern sites per year. For the third assignment, the group had to use the Wayback Machine to compare the 1996 and current versions of the website luxembourg.lu. The Wayback Machine, unlike the BNL, relies on user submissions to capture site screenshots. In the fifth assignment, it was about the fluidity of the web, the exercise focused on Wikipedia, where information can be easily edited by anyone, so we must be careful of possible for fake content. The sixth assignment was about the publishing of family and personal archives on the web. It made us reflect on the digital traces we leave behind, particularly through the publication of personal and family archives on the web. Finally, assignment seven examined the “September 11 Digital Archive,” a project to preserve multimedia about 9/11. The site was created directly in 2002. While it has interesting data, the lack of metadata makes the archive difficult for historians to use.
Summary Session 4 “Impresso”, 10 October 2024
During this session we learned about Impresso. There are two Impresso projects. The first one is called ‘Media Monitoring of the Past (2017–2020)’, it’s the project we talked the most during the session. The second is called ‘Impresso II. Media Monitoring of the Past – Beyond Borders (2023–2027), this one was only briefly mentioned. We learned that Impresso was originally a Swiss-Luxembourgish collaboration, now in the second projects they are incorporating other countries as well. Impresso I digitalise newspapers from Luxembourg and Swisse from the 17th century to the 21st century. Sadly, the newspaper collections are pretty unbalanced in size. Some years have much more newspapers than others. Especially for the older newspapers, since many of them disappeared before being digitalized. In recent years the copyright is an issue. Since the site has collections in French, German, Luxembourgish and English sometimes the results of the ‘NGRAMS’ (used to compare the appearance of terms in a year, example ‘greve’ ‘streik’) can be misleading because the number of collected newspapers in the different languages can be very different per year.
We were divided into groups to explore the different functionalities of Impresso. My group choose to try out the ‘NGRAMS’ we tried with comparing the results for two terms for witchcraft, ‘sorcellerie’ and ‘Hexerei’. We applied the language filter to see only Luxembourgish articles, it showed us 8 results around the early 20th century. We also tried the country filter to see how many articles were published here when and what language German or French.
At the end of the session, we were really quickly introduced to GitHub and how it works to make our own files with the coding language ‘markdown’.
Summary (Reflexion) Session 5 “Maps/StoryMaps”, 16 October 2024
In this session about maps, we got a brief introduction on what maps are, what type of maps there are and how and why to use it in historical research. Then we were divided into groups to explore different sites with maps and then do a story map about it. My group worked on the site ATLASCINE it’s a site combining audiovisual storytelling with transcriptions and a map. For the story map we did it quite simply, we used the mode ‘récit’. We worked with simple text and explained ATLASCINE by answering 4 questions. We also inserted some screen shots in our text to have a visual representation. The insertion of images is very simple more than on Word. Inserting a legend is also simpler because the option shows up automatically on Story maps on Words it doesn’t. Even though we only used the text option and image option for this exercise, there are many more available. There are Classique options like the insertion of text, buttons (on which the reader can click and will be transported to a site with a URL), a separator, a table and a code. There is also multimedia content that can be inserted, like the images we used, a collection of images that are interlinked, even videos and audios. With the option ‘Balayer’ we can simplify the comparison of two maps for example for the readers. There is also the option chronologies which helps illustrate a series of events. Then there is the immersive content like map tours for example and also the option to insert various maps. All in all this site is very useful and quite easy to use. The group that worked on the ‘John Snow Map’ used a few other options than my group, they used text and images but the image of the map changes while we scroll down the text. This is a good function because the show is always the same map but with different information highlighted on the map. Which makes it easier to understand.
Summary Session 6 “Networks”, 23 October 2024
In this session we learned about networks and network analysis. We started with the example of a wedding, this example helped understanding how networks work and the different terminology. Networks are really useful to visualise relationships between different things and can make use realise things we wouldn’t have seen otherwise, like the importance of a person in a Network. The links between people can also explain social behaviours. People can stop or start doing something they normally do/don’t do because of the specific Network they are in at the moment. During a Network visualisation, we might expect to see certain things but sometimes we don’t see those things and then we need to ask ourselves why that’s the case. Everything can be labelled as a Network. We learned what an ego Network is, a network with someone/something in the middle. A Network boundary is the limits of it who/what is included in the network who/what isn’t, we can’t put every single information we know in the Network. We learned what attributes are and how we can group the different nodes of our network. A broker is the more powerful node in a Network and it has the power to stop or spread something because it’s the link between two parts of the Network. There is also the difference between affiliation (nothing moves examples two nodes are part of the same organisation) and interaction (something moves like information). Diameter means the longest way to go to one node to the last node. We learned some other terms as well (density, unidirectional, centrality, etc.). At the end of the session, we tried to create our own Network in small groups using an excel sheet and Palladio. My group (Sébastien, Jelena and me) created one about our marriage.
Summary Session 7 “EP archives “ and critical assessment of dashboards , 30 October 2024
In this session we had members of the EU Parliament archives present what the archives are and how we can use the ‘archibot’ to find documents easier. The goals of the EU Parliament archives are to manage and preserve the documents arriving to/create in the EU Parliament. There are many different types of documents, and there are millions of them. Another goal of this archive is to make the documents accessible to the public as easy as possible. All public institution has to make their documents pubic when the document reaches 30 years. The arrival of AI was a major shift on how to manage the archive and navigate it. The EU Parliament archives team created a website accessible globally to navigate the archives and find documents based on constructional AI. While working with AI, it was very important for the team to keep the total control, avoid any leaks (people need to know from where the documents come from). The EP archives have a system that allows users to make a query, and the AI answers sole based on the documents available in the archives at that moment. This query an answer can be made in 125 languages. The website doesn’t allow the AI to ‘learn’ with people’s query or from outside sources. Control and transparency are very important. They also explained the process of extracting text from digitalised documents so AI could use that text. I don’t really know how to access this functionality which is very useful to ask questions and have an answer based on sources which are also listed, it’s more relatable than ChatGPT.
They also explained how the dashboard and navigation on the website work and the different fitters. I tried the dashboards before the session and have to admit I didn’t understand much of it on my own. EP archives overview dashboard is the first thing we can see. It shows different graphics that illustrate the % of documents in the different languages, % of the different types of the documents and the quantity of documents available by year. These graphics are a good and an easy way to visualise different information that can help us understand the result of our research better (some years have way fewer documents available). With the controls we can select to visualise the graphics of precise Fonds, series, dossiers, etc. With the Dashboard ‘content-analysis’ we can visualise information by topic/content of the documents by clicking on one of the terms shown in the word cloud. There are again different graphics and a list of documents about the content we selected in the word cloud (we click on the title and get access to the document). Then we have the ‘archives-requests’ Dashboard. I think we can see the number of requests by county maybe from where the people requested documents are. I don’t really understand this Dashboard. The last Dashboard is the ‘tools’, there are two tools in this. One is the Eurovoc Tagger option. Where a person can select a document and so find related documents in the archives. We can select the language we want to find documents in (French, English, German). The minimum confidence tool is to select how similar we want the documents to be. If we select 0 is the same. The other tool in this Dashboard is the EP archives summarise. This tool can be useful to extract the important information of documents when we have to analyse a great quantity of them.
I had prepared some questions before the session the majority was answered during the session.
• How do you prevent potential errors from AI and how do you control if it made mistakes? • How might AI impact the accessibility and organisation of archival materials? • What are some ethical concerns related to AI in the field of archives? • Will you add other functionalities in the EP archive Dashboard if yes which ones? • How to access the archibot?
Summary Session 8 “DH Theory”, 6 november 2024
In this session we discussed how to know what is true and what is fake. How do we see if it’s interpretation or voluntary forgery of sources? We saw the example of the ‘historian’/publicist David Irving that was proved to have forged/invented sources to use to prove his arguments. For example, when he denied that Hitler planned the extermination of the Jewish, Irving argued that it wasn’t Hitler’s fault and not his doing. After that he became a negationist of the whole holocaust. He bent the interpretation of existing sources to align with his opinions and political agenda. This led to a whole lawsuit where a group of historians had to proof that Irving was forging and misinterpreting sources. It took two years and retracing every step that Irving did (go see the same sources, etc., go to his speeches, personal archives). This was used as an introduction to talk about how we can make our networks/databases trustworthy. For that we need to make our work retractable and make sure it’s F.A.I.R. (Findable (the data used needs to be findable after project ends), Accessible (copyright), Interpretable (should be possible to link to another network), Reusable (made on technical standards that will be supported for a long time). For Networks we need to document our process of making it. Like the decisions we made (who/what is excluded, type of relationships, etc.), how we structured the data (what constitutes a node, etc.). We need to standardise data and make it consistent (name spelling, date formats, etc.). How we established the connections and how we grouped the data in meaningful categories. All this needs to be documented.