Skip to content

Blog

Anonymizing collections of documents

In the previous blog post, we showed how one can anonymize text in document form. While the code is useful for processing a single document, anonymizing a collection of documents can take time if we run the script for each document separately.

In this blog post, we show how one can anonymize collections of documents. The process is similar to the previous blog post, but loads all required components only once, and anonymizes all documents in one go.

Anonymizing documents

The anonipy package was designed for anonymizing text. However, a lot of text data can be found in document form, such as PDFs, word documents, and other. Copying the text from the documents to be anonymized can be cumbersome. The anonipy package provides utility functions that extracts the text from the documents.

In this blog post, we explain how anonipy can be used to anonymize texts in document form.