2024

December 11, 2024
in Tutorial
3 min read

Document anonymization using pipeline

In this blog post, we show how one can anonymize documents using the new Pipeline module. The module allows for a streamlined process of anonymizing documents, where the user defines how the anonymization should be performed and the locations, where the documents to be anonymized are located and where the anonymized documents should be stored.

The pipeline will automatically extract the text from the documents, anonymize the text, and store the anonymized text in the output folder.

July 16, 2024
in Tutorial
4 min read

Anonymizing collections of documents

In the previous blog post, we showed how one can anonymize text in document form. While the code is useful for processing a single document, anonymizing a collection of documents can take time if we run the script for each document separately.

In this blog post, we show how one can anonymize collections of documents. The process is similar to the previous blog post, but loads all required components only once, and anonymizes all documents in one go.

July 16, 2024
in Tutorial
4 min read

Anonymizing documents

The anonipy package was designed for anonymizing text. However, a lot of text data can be found in document form, such as PDFs, word documents, and other. Copying the text from the documents to be anonymized can be cumbersome. The anonipy package provides utility functions that extracts the text from the documents.

In this blog post, we explain how anonipy can be used to anonymize texts in document form.

July 15, 2024
in Overview
7 min read

Extractors overview

In this post, we will show an overview of the implemented extractors. The extractors are used to extract relevant named entities from text. These entities can be people names, organizations, addresses, social security numbers, etc. The entities are then used to anonymize the text.

All extractors and their API references are available in the extractors module. What follows is the presentation of the different extractors anonipy provides.

July 15, 2024
in Overview
6 min read

Generators overview

In this post, we will show an overview of the implemented generators. The generators are used to create new texts that would serve as substitutes to the extracted named entities. The substitutes can be then used to replace and anonymize the text.

All generators and their API references are available in the generators module. What follows is the presentation of the different generators anonipy provides.

July 15, 2024
in Overview
7 min read

Strategies overview

In this post, we will show an overview of the implemented strategies. The strategies delegate how the original text will be anonymized given the extracted named entities. They output the anonymized text and the list of replacements that were made to the original text.

All strategies and their API references are available in the strategies module. What follows is the presentation of the different strategies anonipy provides.