Updating search results...

Search Resources

6 Results

View
Selected filters:
  • Center for History and New Media
The Programming Historian 2: Applied Archival Downloading with Wget
Unrestricted Use
CC BY
Rating
0.0 stars

Now that you have learned how Wget can be used to mirror or download specific files from websites like ActiveHistory.ca via the command line, it’s time to expand your web-scraping skills through a few more lessons that focus on other uses for Wget’s recursive retrieval function. The following tutorial provides three examples of how Wget can be used to download large collections of documents from archival websites with assistance from the Python programing language. It will teach you how to parse and generate a list of URLs using a simple Python script, and will also introduce you to a few of Wget’s other useful features. Similar functions to the ones demonstrated in this lesson can be achieved using curl, an open-source software capable of performing automated downloads from the command line. For this lesson, however, we will focus on Wget and building your Python skills.

Subject:
Computer Science
Computer, Networking and Telecommunications Systems
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Date Added:
04/11/2023
The Programming Historian 2: From HTML to List of Words (part 1)
Unrestricted Use
CC BY
Rating
0.0 stars

In this two-part lesson, we will build on what you’ve learned about Working with Webpages, learning how to remove the HTML markup from the webpage of Benjamin Bowsey’s 1780 criminal trial transcript. We will achieve this by using a variety of string operators, string methods and close reading skills. We introduce looping and branching so that programs can repeat tasks and test for certain conditions, making it possible to separate the content from the HTML tags. Finally, we convert content from a long string to a list of words that can later be sorted, indexed, and counted.

Subject:
Computer Science
Computer, Networking and Telecommunications Systems
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Date Added:
04/11/2023
The Programming Historian 2: Output Keywords in Context in HTML File
Unrestricted Use
CC BY
Rating
0.0 stars

This lesson builds on Keywords in Context (Using N-grams), where n-grams were extracted from a text. Here, you will learn how to output all of the n-grams of a given keyword in a document downloaded from the Internet, and display them clearly in your browser window.

Subject:
Computer Science
Computer, Networking and Telecommunications Systems
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Date Added:
04/11/2023
The Programming Historian 2: Python Introduction and Installation
Unrestricted Use
CC BY
Rating
0.0 stars

This first lesson in our section on dealing with Online Sources is designed to get you and your computer set up to start programming. We will focus on installing the relevant software – all free and reputable – and finally we will help you to get your toes wet with some simple programming that provides immediate results.

In this opening module you will install the Python programming language, the Beautiful Soup HTML/XML parser, and a text editor. Screencaps provided here come from Komodo Edit, but you can use any text editor capable of working with Python. Here’s a list of other options: Python Editors. Once everything is installed, you will write your first programs, “Hello World” in Python and HTML.

Subject:
Computer Science
Computer, Networking and Telecommunications Systems
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Date Added:
04/11/2023
The Programming Historian 2: Transliterating non-ASCII characters with Python
Unrestricted Use
CC BY
Rating
0.0 stars

This lesson shows how to use Python to transliterate automatically a list of words from a language with a non-Latin alphabet to a standardized format using the American Standard Code for Information Interchange (ASCII) characters. It builds on readers’ understanding of Python from the lessons “Viewing HTML Files,” “Working with Web Pages,” “From HTML to List of Words (part 1)” and “Intro to Beautiful Soup.” At the end of the lesson, we will use the transliteration dictionary to convert the names from a database of the Russian organization Memorial from Cyrillic into Latin characters. Although the example uses Cyrillic characters, the technique can be reproduced with other alphabets using Unicode.

Subject:
Computer Science
Computer, Networking and Telecommunications Systems
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Date Added:
04/11/2023
The Programming Historian 2: Working with Text Files
Unrestricted Use
CC BY
Rating
0.0 stars

In this lesson you will learn how to manipulate text files using Python. This includes opening, closing, reading from, and writing to .txt files.

The next few lessons will involve downloading a web page from the Internet and reorganizing the contents into useful chunks of information. You will be doing most of your work using Python code written and executed in Komodo Edit.

Subject:
Computer Science
Computer, Networking and Telecommunications Systems
Material Type:
Diagram/Illustration
Provider:
Center for History and New Media
Date Added:
04/11/2023