Summary

The Web Archive Collection Zipped (WACZ) is a specification for a type of file called a web archive created and used by the Webrecorder set of tools. Within a compressed .wacz file you will find all of the data and documents required to re-create web pages as they were captured at a given period in time, when you ran the web archiving program.

the .wacz file type is a specific setup of files and data that can easily be used to display information scraped from a webpage. This is useful because the content that exists a a certain URL at a given point in time is likely to change. By capturing the webpage at a specific moment in time, we can preserve important historic and legal context.

WACZ-File.gif

<aside> 💡 WARC Files

The original type of files created for web archives were called WARC files. WARC files can contain large amounts of data including images, video files, HTML, page styling, and more. These files were slow and unreliable to load and display in a web browser because they lacked indexes and other metadata that allow the browser to quickly download and display only the relevant pieces of data a user might want to view. With WARC files, the browser had to read and process all of the data contained in the file before it could load and display it. Using the WACZ file format means that you would be able to view the photos and text on a page, before you would have to download, say, a large video file that was also a part of that captured web page.

WACZ files can include several WARC files, verification information such as signatures, as well as an index file that helps whichever tool (such as a web browser) that is going to display data locate and load only the data they need. The WACZ is a compressed file format which means that it takes up less space and bandwidth to read and transfer, while still making it possible to find view the parts of the file that you want to preview.

</aside>

Creating WACZ Files

WACZ files can be created either using an automated tool that can be configured with URLs and other options such as how many links deep you can crawl, or using a chrome extension that captures web page(s) and information about those pages to make it easy to index and display. These files are stored in a set of pages called a collection. Learn more about creating pages: Webrecorder Tools: Manual Crawl vs Browsertrix

Viewing WACZ Files

You can easily view and create views of .wacz files using the online webreplay tool, and by creating embedded views within another website. You can also download the webreplay app, and when you double click on any .wacz file on your computer, they will be opened automatically in the app.

Any type of website media (be it websites, social media posts, or video content) can be included in a .wacz file to be preserved and displayed by opening it in WebReplay.

WACZ File Structure

The .wacz file format makes it possible to capture all the pieces of content on a webpage to enable you to recreate and even publish that website, as it was at a particular snapshot in time. If you change a .wacz file to have a .zip extension, and you unzip a .wacz file, inside you will see 3 directories and 2 .json files.

Screenshot 2023-05-26 at 9.38.20 AM.png