Session summaries by Sébastien Bonhomme

written by: — October 8, 2024

Summary of the 25th of September 2024 session of Introduction to Digital History

During the Introduction to Digital History lesson of the 25th of September 2024, we were introduced to a few key elements of digital history, namely who is and who isn’t a digital historian and what is data and the related metadata. A digital historian for example isn’t just any historian who uses digital tools, but all historians today are confronted with sources that are digitalised. Data, initially a Latin loanword, became a proper English word in the late 18th century and meant “the admitted elements of a problem”. In our context, data are things/information we create, store, manipulate, analyse and interpret. On that matter, data used as sources for a historian’s work becomes “research data”. Metadata, on the other hand, is, in short terms, the data of the data, the information around the information. Thus, for example, a picture file is the data, but its name, its date of creation and upload for example are metadata. There are two types of metadata, on one side embedded data, which the machine creates automatically, on the other, enriched data, which is created by the used/the historian. We were also introduced to Trophy, a program that enables one to save, organize adequately and describe with tags image files used by historians as research data. In other terms, it is kind of like Zotero in many ways, but for images, although some key features of Zotero like the generating of citations is notably absent.

Summary of the 2nd of October 2024 session of Introduction to Digital History

During the Introduction to Digital History lesson of the 25th of September 2024, we were acquainted with the concept of web archives in their vast array of forms as digital vaults key to storing a variety of data in all their forms. The lesson took, for this occasion, a more student-participative approach as we were divided into 6 groups to each present a different example and their aspects. A first group presented the Internet Archive, a non-profit organization formed in 1996 which aims to preserve documents, files, songs, movies and websites on the Internet. Internet Archive must regularly face lawsuits by companies and labels accusing them of copyright infringement as many books are also made available on the archive. A second group presented a sort of Luxembourgish version of the Internet Archive called Luxembourg Archive, a project by the BNL which aims to preserve among others Luxembourgish Websites of which they take 4 screenshots every year. A third group presented the “Wayback Machine”, affiliated with the Internet Archive, a website that let’s one visit how a specific site looked like eons ago, if a version of it had been saved. A fourth group analysed the “revision history” of a Wikipedia article, showing a variety of details and changes made over time, some useful, and some cases of vandalism. The fifth group presented archives on a more personal level, the family, with images, but also by presenting how they themselves had shared data and information regarding family online as is subconsciously done frequently by many people. The final group presented the September 11th Digital Archive, an archive with the goal of preserving data and a variety of media, partially made available by private persons, related to the September 11th, 2001, terrorist attack on the Twin Towers.

Summary of the 9th of October 2024 session of Introduction to Digital History

During the lesson of the 9th of October, we were first familiarized with the “Impresso” project and app, an initially swiss-luxembourgish collaboration, now incorporating partners from other countries, that, via a current corpus of over 70 Swiss and Luxembourgish Newspapers, aims at significantly improving the way researchers in the field of History consult, find, use and then interpret data from the selected newspapers. This is made possible with an enhanced keyword-search mechanism that can, with added parameters, consider OCR-mistakes as well as synonyms. Furthermore, other Impresso tools like Ngram, to give an example, permitted seeing the occurrences of certain keywords over time. We were then divided into groups and had the task of exploring specific aspects of Impresso on any given subject. As group 4, we analysed the information given by the Ngram tool of Impresso about sorcery as a chosen topic with the keywords “sorcellerie” and “Hexerei”, giving 6.884 results, which was further refined with the added language filter of “Luxembourgish” gave us 8 results nonetheless who were concentrated around the very early 20th century. We then used the country filter to see specifically the results from Luxembourg alone. Care must be taken with such results as bias are very easily conceived and a researcher can easily only see what he likes to see or expected. Lastly, we were introduced to Github, a social platform aimed at programmers, which we will use to submit our weekly summaries via a coding language called “Markdown”.

Reflexion on the 16th of October 2024 session of Introduction to Digital History

During this lesson, we discussed the use of digital maps in History and were acquainted with the Website “Storymaps”. Subsequently, we were put into groups and each had to create a Storymap around a selected project using digital maps to highlight a specific topic. Our project, called “Atlascine”, is a project with the purpose of collecting and mapping audiovisual media, specifically oral accounts of a variety of events in people’s personal histories. We experienced technical difficulties regarding the introduction video on Atlascine whose sound didn’t work initially, gave us a delay. For our Storymap, we collected the information the site offered to answer the four questions related to our group work, which were the goal of the project, how to the maps support the oral accounts, if they did actually help and what we can get from this. For our attempt at Storymap, we added our collected answers and their respective questions chronologically on the Storymap and included three screenshots taken from Atlascine showcasing, respectively, with captions under each, how the said parts of an oral account are highlighted via a circle on the map, how one can select between coloured themes but also, certain limitations, like the highlighted part at times not, when clicked, not going to the matching text part. It is to the core and looks like a standard article on a generic website in a way. The group who did “Preserving Society Hill” went similarly with their set of questions, however in keywords instead of text, but most importantly, they added, beyond an image, an interactive map that shows the neighbourhood in a local map, highlighted in a blue square, which, when clicked on gives us on a small globe the exact location of the area in a much wider geographical context.

Summary of the 23rd of October 2024 session of Introduction to Digital History

For the 23rd of October lesson, we were touched upon the subject of networks and their visualisation and how they could be utilised efficiently by historians to illustrate information given by a set of sources, allowing the recognition of patterns or prominence of a certain figure/aspect, left otherwise obscure. We were introduced to this concept of networks via the example of a wedding and the planification of the different guests attending, their attributes (here for example coworkers, singles, friends etc.) and how they are grouped on that network. We were met with other types of occurrences and notions around these social networks like brokers, to give an example, and the general boundaries of a network. We used a site called Palladio that allows the visualisation of networks through spreadsheets files like the ones on Excel. We were later tasked with creating our own network to visualize via Excel and Palladio. Initially, I had the idea of creating one related to Rock and Heavy Metal bands and how they could be linked with each other but due to time constraints and not being sure how to make it work, I banded together with Emilie and Jelena to work on a new wedding guest list to have a more familiar basis, as the topic matter wasn’t always easy to follow.

Summary, review and questions on the 30rd of October 2024 session of Introduction of Digital History

For the session on the 30th of September, we were invited to a conference held by Ludovic Delépine, Head of Unit at the archives of the European Parliament, on the topic of their incorporation of AI for a more efficient search engine for the public wishing to access their documents. The utilized AI, called Archibot, is based on Anthropic’s Claude AI and aims for limited hazards via constitutional AI. The whole process in general for the addition of the documents utilises many different aspects, like for example a tool for the extraction of text data from previously scanned documents. When looking up for documents, the AI can give results based on their metadata, their content and the frequency of given thematic keywords. The prompts and queries done by the user for their research can be done in a multitude of world languages, however some languages, like Romanian for example, have more issues than others and the AI may switch back to English to give an answer. The result shows the 10 results deemed the most relevant with each a brief description beforehand. The AI, to avoid biases or potential errors, only shows results that are indeed from the Archives of the European Parliament and who respect the human rights convention of the United Nations. Furthermore, this corpus of sources comprises documents ranging from 1952 and 1994. The EU frameworks mandate that documents from 30 years ago on must be rendered accessible to the public. After the meeting, some questions were asked and answered. For example, if this process would be made available for other archivistic institutions across Europe. This however is, at the current time, impossible as the search engine and AI is based on software that is not opensource. Perhaps if one day it will be via free software, it may be shared with other institutions. To add up, the utilised softwares are those the EU has given their approval for. Upon accessing and contemplating the dashboard of said European Parliament archives, one may make several commentaries regarding the way it is build and presented. Firstly, the dynamic display of the number of available documents and its changes with each filter is informationally appealing, as are the graphical displays of each type and languages for among these documents, their amount each year respectively etc. However, one may criticise the way, in the filter “fond”, how they are named, namely with a set code not indicating at all what they could refer to. This is most certainly not adapted for the grand public the archive of the European Parliament aims to reach. Names rather like those given at the National Archives of Luxembourg for their fonds should be opted for instead as these give some information of what is to be expected inside it.

Is there not a risk or fear that with the advent of the usage of artificial intelligence in archival work, but also in general of such automatization, that more and more aspects of your work will be delegated to the (intelligent) machine and that thus, more archivists aren’t needed anymore, because the machine can now do the task more so and that as a consequence, some of you will lose their employment in the future or that less and less positions will exist and archivists be demanded?

How can you make the AI fully trustworthy? Most often than not, AIs have been fed, sometimes maliciously, with information that ended up harming their learning process or have easily been manipulated by their human interlocutor.

How did you digitalize all these millions of documents and “AI” with it? Must have been an immensely long process over years.

How do you plan on waking the interest of the general public to discovering the European Parliament Archives and use this AI and not just the more academic public?

Summary of the 6th of November 2024 session of Introduction to Digital History

During this lesson of Digital History, we were first met with the key question of how to know whether something, for example an historical information or some data, is truthful or not. The case of British “historian” David Irving was of particular importance. Irving wrote books about WW2 and most notably, laid the claim that Adolf Hitler was unaware of the Holocaust and if he had been, would have stopped it. This, overtime, evolved, with further works, into being a full-on Holocaust denier, leading to a trial with dozens of other historians attempting, with success, to demonstrate that the Holocaust did indeed happen and an analyse of how Irving’s dishonest interpretation of some documents led to his stance. We then applied this idea to databases and how historians can make their network data a viable source for research via a process of Selection (which data chosen?), Modelling (how is said data structured?), Normalisation (making the data consistent), Linking (connecting the data with data from another researcher) and finally Classification (grouping data into meaningful categories) in text-form. All in all, a historian’s amassed data should also, in the best case scenario, follow the F.A.I.R. concept, in that the data shall be findable by other historians afterwards, for example in a depository, be accessible copyright wise, interoperable, in other terms connectable to another set of data from a different historian, given linked research subjects, and finally reusable, with a certain technical norm.