An image of a knot with many ropes
Gabi Kirilloff
This lesson provides strategies for incorporating game creation into the classroom. The first half of the lesson discusses the challenges and benefits of teaching game creation while the second half includes a technical tutorial for Twine, an open source game creation tool.
Microscope images of bacteria
Thomas Jurczyk
This tutorial demonstrates how to apply clustering algorithms with Python to a dataset with two concrete use cases. The first example uses clustering to identify meaningful groups of Greco-Roman authors based on their publications and their reception. The second use case applies clustering algorithms to textual data in order to discover thematic groups. After finishing this tutorial, you will be able to use clustering in Python with Scikit-learn applied to your own data, adding an invaluable method to your toolbox for exploratory data analysis.
A crowd of people chipping away at a stone
Halle Burns
Pandas is a popular and powerful package used in Python communities for data handling and analysis. This lesson describes crowdsourcing as a form of data creation as well as how pandas can be used to prepare a crowdsourced dataset for analysis. This lesson covers managing duplicate and missing data and explains the difficulties of dealing with dates.
Stack of newspapers surrounded by quills and telegraph wires
Matteo Romanello and Simon Hengchen
In this lesson you will learn about text reuse detection -- the automatic identification of reused passages in texts -- and why you might want to use it in your research. Through a detailed installation guide and two case studies, this lesson will teach you the ropes of Passim, an open source and scalable tool for text reuse detection.
A man and a woman dancing in a group
Amanda Visconti, Brandon Walsh, and Scholars' Lab Community
In this lesson you will be introduced to the challenges and opportunities that Jekyll, a popular, static site generator, offers for publishing collaborative, ongoing research online.
Image of a partial eclipse.
John R. Ladd
This lesson introduces three common measures for determining how similar texts are to one another: city block distance, Euclidean distance, and cosine distance. You will learn the general principles behind similarity, the different advantages of these measures, and how to calculate each of them using the SciPy Python library.
working-with-batches-of-pdf-files
Moritz Mähr
Learn how to perform OCR and text extraction with free command line tools like Tesseract and Poppler and how to get an overview of large numbers of PDF documents using topic modeling.
The planet Jupiter
Quinn Dombrowski, Tassie Gniady, and David Kloster
Jupyter notebooks provide an environment where you can freely combine human-readable narrative with computer-readable code. This lesson describes how to install the Jupyter Notebook software, how to run and create Jupyter notebook files, and contexts where Jupyter notebooks can be particularly helpful.
Le Petit poisson et le pêcheur (The Fisherman and the Little Fish) / G. Doré ; A. Bertrand, Bibliothèque nationale de France, ark:/12148/btv1b10321920p
Brad Rittenhouse, Ximin Mi, and Courtney Allen
Learn how to acquire Twitter data and process them to make them usable for further analysis.
A mechanical device with gears and wheels
Go Sugimoto
This lesson introduces a way to populate a website with data obtained from another website via an Application Programming Interface (API). Using some simple programming, it provides strategies for customizing the presentation of that data, providing flexible and generalizable skills.
Farmer standing before a fruit tree
Adam Crymble
This lesson introduces gravity models as a means for determining the probable distribution of entities across space in historical datasets. It does so through a case study of historical migration patterns.
Scientific measuring device
Stephen Krewson
Machine learning and API extensions by HathiTrust and Internet Archive are making it easier to extract page regions of visual interest from digitized volumes. This lesson shows how to efficiently extract those regions and, in doing so, prompt new, visual research questions.
An antique camera
Dave Rodriguez
This lesson introduces the basic functions of FFmpeg, a free command-line tool used for manipulating and analyzing audiovisual materials.
A sundial
Alex Brey
Learn how to use R to analyze networks that change over time.
An aerial view of city blocks
Eric Weinberg
In this lesson, you will use R-language to analyze and map geospatial data.
An optical instrument resembling a telescope
Jacob W. Greene
This lesson serves as an introduction to creating mobile augmented reality applications. Augmented reality (AR) can be defined as the overlaying of digital content (images, video, text, sound, etc.) onto physical objects or locations, and it is typically experienced by looking through the camera lens of an electronic device such as a smartphone, tablet, or optical head-mounted display.
Men with torches in an antique tomb
Charlie Harper
In this lesson you will learn how to visually explore and present data in Python by using the Bokeh and Pandas libraries.
A hand holding a newspaper
Jeff Blackadar
This lesson will help you store large amounts of historical data in a structured manner, search and filter that data, and visualize some of the data as a graph.
A woman reading next to a painting
François Dominic Laramée
In this lesson you will learn to conduct 'stylometric analysis' on texts and determine authorship of disputed texts. The lesson covers three methods: Mendenhall's Characteristic Curves of Composition, Kilgariff's Chi-Squared Method, and John Burrows' Delta Method.
Diagram with a series of arcs describing a quarter circle
Patrick Smyth
Learn how to set up a basic Application Programming Interface (API) to make your data more accessible to users. This lesson also discusses principles of API design and the benefits of APIs for digital projects.
Constellation chart
Jon MacKay
In this lesson we will learn how to use a graph database to store and analyze complex networked information. This tutorial will focus on the Neo4j graph database, and the Cypher query language that comes with it.
A laughing man and a grouchy man
Zoë Wilkinson Saldaña
In this lesson you will learn to conduct 'sentiment analysis' on texts and to interpret the results. This is a form of exploratory data analysis based on natural language processing. You will learn to install all appropriate software and to build a reusable program that can be applied to your own texts.
Map of the city of Edinburgh
Beatrice Alex
This tutorial teaches users how to use the Edinburgh Geoparser to process a piece of English-language text, extract and resolve the locations contained within it, and plot them as a web map.
Diagram of a cube with labeled edges
Ryan Deschamps
This tutorial explains how to carry out and interpret a correspondence analysis, which can be used to identify relationships within categorical data.
A device with several interlocking gears
Shawn Graham
An Introduction to Twitter Bots with Tracery This lesson explains how to create simple twitterbots using Tracery and the Cheap Bots Done Quick service. Tracery exists in multiple languages and can be integrated into websites, games, bots.
Map of a mountainous terrain
Kim Pham
This tutorial teaches users how to create a web map based on tabular data.
Train tracks intersecting
John R. Ladd, Jessica Otis, Christopher N. Warren, and Scott Weingart
This lesson introduces network metrics and how to draw conclusions from them when working with humanities data. You will learn how to use the NetworkX Python package to produce and work with these network statistics.
Machine for water filtration
Evan Peter Williamson
OpenRefine is a powerful tool for exploring, cleaning, and transforming data. In this lesson you will learn how to use Refine to fetch URLs and parse web content.
Bar of soap
Nabeel Siddiqui
This tutorial explores how scholars can organize 'tidy' data, understand R packages to manipulate data, and conduct basic data analysis.
An old man with a woman on each arm
Jonathan Blaney
Introduces core concepts of Linked Open Data, including URIs, ontologies, RDF formats, and a gentle intro to the graph query language SPARQL.
A woman throwing letters near a mailbox
Stephanie J. Richmond and Tommy Tavenner
Demonstrates how to use the JavaScript library "Leaflet" to produce an interactive map that can be hosted online or viewed locally, and demonstrates how to customize many of its features.
Children visiting a mobile book-mobile
Taylor Arnold and Lauren Tilton
Learn how to use R to analyze high-level patterns in texts, apply stylometric methods over time and across authors, and use summary methods to describe items in a corpus.
A young man kissing a young woman on the cheek
Justin Colson
Learn how to use QGIS to convert lists of place names in to geographic coordinates, allowing you to map them.
A book inside a torn case
Peter Organisciak and Boris Capitanu
Explains how to use Python to summarize and visualize data on millions of texts from the HathiTrust Research Center's Extracted Features dataset.
An ornate illustrated character R
Taryn Dewar
This lesson teaches a way to quickly analyze large volumes of tabular data, making research faster and more effective.
Two gramophones facing each other
Brandon Walsh
In this lesson you will learn how to use Audacity to load, record, edit, mix, and export audio files.
A figure working at a machine with gear diagrams
Jonathan Reeve
This lesson will teach you how to install your own copy of Omeka.
An ornate seashell
Ted Dawson
This tutorial will introduce you to the basics of Windows PowerShell, the standard command-line interface for Windows computers.
A peacock with a woman's head
M. H. Beals
This tutorial will provide you with the ability to convert or transform historical data from an XML database (whether a single file or several linked documents) into a variety of different presentations—condensed tables, exhaustive lists or paragraphed narratives—and file formats.
A violin
Shawn Graham
There are any number of guides that will help you visualize the past, but this lesson will help you hear the past.
A grid-like device for drawing lines
Matthew Lincoln
Working with data from an art museum API and from the Twitter API, this lesson teaches how to use the command-line utility _jq_ to filter and parse complex JSON files into flat CSV files.
An illustration of Dr. Jekyll transforming into Mr. Hyde
Amanda Visconti
This lesson will help you create entirely free, easy-to-maintain, preservation-friendly, secure website over which you have full control, such as a scholarly blog, project website, or online portfolio.
Ornate room filled with paintings hung salon-style
Miriam Posner and Megan R. Brett
Now that you've added items to your Omeka site and grouped them into collections, you're ready for the next step: taking your users on a guided tour through the items you've collected.
Dinosaur skeleton in a museum
Miriam Posner
Omeka.net makes it easy to create websites that show off collections of items.
Woman churning butter or milk
Adam Crymble
This lesson will teach you how to use Python to extract a set of keywords very quickly and systematically from a set of texts.
Ornate decorated characters from a typographical manual
Sarah Simpkin
In this lesson, you will be introduced to Markdown, a plain text-based syntax for formatting documents. You will find out why it is used, how to format Markdown files, and how to preview Markdown-formatted documents on the web.
An image of a tree with the Latin phrase Labor Omnia Vincit Improbus
Andrew Akhlaghi
This lesson covers how to convert images of text into text files and translate those text files. The lesson will also cover how to organize and edit images to make the conversion and translation of whole folders of text files easier and more accurate. The lesson concludes with a discussion of the shortcomings of automated translation and how to overcome them.
An old mechanical typewriter
Matthew J. Lavin
This lesson focuses on a foundational natural language processing and information retrieval method called Term Frequency - Inverse Document Frequency (tf-idf). This lesson explores the foundations of tf-idf, and will also introduce you to some of the questions and concepts of computationally oriented text analysis.
Jacob W. Greene
This lesson serves as an introduction to creating mobile augmented reality applications. Augmented reality (AR) can be defined as the overlaying of digital content (images, video, text, sound, etc.) onto physical objects or locations, and it is typically experienced by looking through the camera lens of an electronic device such as a smartphone, tablet, or optical head-mounted display.
Three large ornate bookcases
Heather Froehlich
Corpus analysis is a form of text analysis which allows you to make comparisons between textual objects at a large scale (so-called 'distant reading').
Diagram of the earth and moon's revolution around the sun
Marten Düring
Network visualizations can help humanities scholars reveal hidden and complex patterns and structures in textual sources. This tutorial explains how to extract network data (people, institutions, places, etc) from historical sources through the use of non-technical methods developed in Qualitative Data Analysis (QDA) and Social Network Analysis (SNA), and how to visualize this data with the platform-independent and particularly easy-to-use Palladio.
A man peers through a geometric tool
Vilja Hulden
This lesson shows how to use machine learning to extract interesting documents out of a digital archive.
A small case with a set of books
Jon Crump
This tutorial illustrates strategies for taking raw OCR output from a scanned text, parsing it to isolate and correct essential elements of metadata, and generating an ordered data set (a python dictionary) from it.
Soldiers in antique armor with spears
Ian Milligan and James Baker
This lesson will teach you how to enter commands using a command-line interface, rather than through a graphical interface. Command-line interfaces have advantages for computer users who need more precision in their work, such as digital historians. They allow for more detail when running some programs, as you can add modifiers to specify exactly how you want your program to run. Furthermore, they can be easily automated through scripts, which are essentially recipes of text-based commands.
A diagram of a miner sorting ore into an apparatus
James Baker and Ian Milligan
This lesson will look at how research data, when organised in a clear and predictable manner, can be counted and mined using the Unix shell.
A large barrel
James Baker
This lesson will suggest ways in which historians can document and structure their research data so as to ensure it remains useful in the future.
A man working at a drafting table
Dennis Tenen and Grant Wythoff
In this tutorial, you will first learn the basics of Markdown—an easy to read and write markup syntax for plain text—as well as Pandoc, a command line tool that converts plain text into a number of beautifully formatted file types: PDF, .docx, HTML, LaTeX, slide decks, and more.
Group of of men working in a mine
Caleb McDaniel
The collections of the Internet Archive include many digitized historical sources. Many contain rich bibliographic data in a format called MARC. In this lesson, you'll learn how to use Python to automate the downloading of large numbers of MARC files from the Internet Archive and the parsing of MARC records for specific information such as authors, places of publication, and dates. The lesson can be applied more generally to other Internet Archive files and to MARC records found elsewhere.
Map of a moutnaintop city
Jim Clifford, Josh MacFadyen, and Daniel Macfarlane
In this lesson, you will learn how to georeference historical maps so that they may be added to a GIS as a raster layer.
An old man consulting a large globe with a compass
Jim Clifford, Josh MacFadyen, and Daniel Macfarlane
Google My Maps and Google Earth provide an easy way to start creating digital maps. With a Google Account you can create and edit personal maps by clicking on My Places.
Elevation view view of a mountain range
Jim Clifford, Josh MacFadyen, and Daniel Macfarlane
In this lesson you will install QGIS software, download geospatial files like shapefiles and GeoTIFFs, and create a map out of a number of vector and raster layers.
Map of city streets
Jim Clifford, Josh MacFadyen, and Daniel Macfarlane
In this lesson you will learn how to create vector layers based on scanned historical maps.
A set of Cyrillic characters
Seth Bernstein
This lesson shows how to use Python to transliterate automatically a list of words from a language with a non-Latin alphabet to a standardized format using the American Standard Code for Information Interchange (ASCII) characters.
Diagram of a well-drilling aparatus
Kellen Kurschinski
Now that you have learned how Wget can be used to mirror or download specific files from websites via the command line, it's time to expand your web-scraping skills through a few more lessons that focus on other uses for Wget's recursive retrieval function.
Two men laundering clothes outside
Seth van Hooland, Ruben Verborgh, and Max De Wilde
This tutorial focuses on how scholars can diagnose and act upon the accuracy of data.
Person studying a book at a desk
Doug Knox
In this lesson, we will use advanced find-and-replace capabilities in a word processing application in order to make use of structure in a brief historical document that is essentially a table in the form of prose.
A typesetter and inker at work on a printing press
Laura Turner O'Hara
Optical Character Recognition (OCR)—the conversion of scanned images to machine-encoded text—has proven a godsend for historical research. This lesson will help you clean up OCR'd text to make it more usable.
A branch with pears
Fred Gibbs
There are many ways to install external python libraries; this tutorial explains one of the most common methods using pip.
Figures working in a mine, pushing carts
Adam Crymble
Downloading a single record from a website is easy, but downloading many records at a time – an increasingly frequent need for a historian – is much more efficient using a programming language such as Python. In this lesson, we will write a program that will download a series of records from the Old Bailey Online using custom search criteria, and save them to a directory on our computer.
A man striking an anvil with a large hammer
Shawn Graham, Scott Weingart, and Ian Milligan
In this lesson you will first learn what topic modeling is and why you might want to employ it in your research. You will then learn how to install and work with the MALLET natural language processing toolkit to do so.
Three caricature heads
William J. Turkel and Adam Crymble
Computer programs can become long, unwieldy and confusing without special mechanisms for managing complexity. This lesson will show you how to reuse parts of your code by writing functions and break your programs into modules, in order to keep everything concise and easier to debug.
Disgruntled man sitting on a log surrounded by birds
William J. Turkel and Adam Crymble
Counting the frequency of specific words in a list can provide illustrative data. This lesson will teach you Python's easy way to count such frequencies.
Child drawing on a tablet
William J. Turkel and Adam Crymble
Here you will learn how to create HTML files with Python scripts, and how to use Python to automatically open an HTML file in Firefox.
A giraffe being mimicked by a human
William J. Turkel and Adam Crymble
In this two-part lesson, we will build on what you’ve learned about Downloading Web Pages with Python, learning how to remove the HTML markup from the webpage of Benjamin Bowsey’s 1780 criminal trial transcript. We will achieve this by using a variety of string operators, string methods, and close reading skills. We introduce looping and branching so that programs can repeat tasks and test for certain conditions, making it possible to separate the content from the HTML tags. Finally, we convert content from a long string to a list of words that can later be sorted, indexed, and counted.
A soldier being mocked by a man
William J. Turkel and Adam Crymble
In this lesson, you will learn the Python commands needed to implement the second part of the algorithm begun in the lesson 'From HTML to a List of Words (part 1)'.
A curled-up snake
William J. Turkel and Adam Crymble
This first lesson in our section on dealing with Online Sources is designed to get you and your computer set up to start programming. We will focus on installing the relevant software – all free and reputable – and finally we will help you to get your toes wet with some simple programming that provides immediate results.
A figure dropping two bottles of alcohol
William J. Turkel and Adam Crymble
This lesson takes the frequency pairs collected in "Counting Frequencies" and outputs them in HTML.
A band with three musicians
William J. Turkel and Adam Crymble
This lesson will help you set up an integrated development environment for Python on a computer running the Linux operating system.
A band with three musicians
William J. Turkel and Adam Crymble
This lesson will help you set up an integrated development environment for Python on a computer running a Mac operating system.
A man playing a guitar
William J. Turkel and Adam Crymble
This lesson is a brief introduction to string manipulation techniques in Python.
Tall woman dragging a short young man
William J. Turkel and Adam Crymble
In this lesson, we will make the list we created in the 'From HTML to a List of Words' lesson easier to analyze by normalizing this data.
Daniel van Strien
In this lesson you will be introduced to the basics of version control, understand why it is useful and implement basic version control for a plain text document using git and GitHub.
Pisces symbol of two linked fish
Matthew Lincoln
This lesson explains why many cultural institutions are adopting graph databases, and how researchers can access these data though the query language called SPARQL.
Spencer Roberts
This lesson will show you how to get information from Zotero HTML items, save the content from those items, and count the frequencies of words.
Amanda Morton
In this lesson, you will create a new item in a Zotero library and add some basic metadata such as title and date.
Amanda Morton
In this lesson, you’ll learn how to use python with the Zotero API to interact with your Zotero library.
A soup tureen
Jeri Wieringa
Beautiful Soup is a Python library for getting data out of HTML, XML, and other markup languages.
A woman wearing an elaborate dress accompanied by two putti
William J. Turkel and Adam Crymble
This lesson takes the frequency pairs created in the 'Counting Frequencies' lesson and outputs them to an HTML file.
A monkey dancing with a lion and a bear
William J. Turkel and Adam Crymble
This lesson builds on 'Keywords in Context (Using N-grams)', where n-grams were extracted from a text. Here, you will learn how to output all of the n-grams of a given keyword in a document downloaded from the Internet, and display them clearly in your browser window.
A woman listening to a man through an ear trumpet
William J. Turkel and Adam Crymble
This lesson introduces you to HTML and the web pages it structures.
A band of three musicians
William J. Turkel and Adam Crymble
This lesson will help you set up an integrated development environment for Python on a computer running the Windows operating system.
Bespectacled man reading an alphabet book
William J. Turkel and Adam Crymble
In this lesson you will learn how to manipulate text files using Python.
A tall man next to a short woman
William J. Turkel and Adam Crymble
This lesson introduces Uniform Resource Locators (URLs) and explains how to use Python to download and save the contents of a web page to your local hard drive.
Diagram of an elevator system in a mineshaft
Ian Milligan
Wget is a useful program, run through your computer's command line, for retrieving online material.