Generators Module
anonipy.anonymize.generators
Module containing the generators
.
The generators
module provides a set of generators used to generate data
substitutes.
Classes:
Name | Description |
---|---|
LLMLabelGenerator |
The class representing the label generator utilizing LLMs. |
MaskLabelGenerator |
The class representing the label generator utilizing token masking. |
NumberGenerator |
The class representing the number generator. |
DateGenerator |
The class representing the date generator. |
anonipy.anonymize.generators.LLMLabelGenerator
Bases: GeneratorInterface
The class representing the LLM label generator.
Examples:
>>> from anonipy.anonymize.generators import LLMLabelGenerator
>>> generator = LLMLabelGenerator()
>>> generator.generate(entity)
Attributes:
Name | Type | Description |
---|---|---|
model |
Transformers
|
The model used to generate the label substitutes. |
Methods:
Name | Description |
---|---|
generate |
Generate the label based on the entity. |
Source code in anonipy/anonymize/generators/llm_label_generator.py
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 |
|
__init__(*args, model_name='HuggingFaceTB/SmolLM2-1.7B-Instruct', use_gpu=False, use_quant=False, **kwargs)
Initializes the LLM label generator.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
The name of the model to use. |
'HuggingFaceTB/SmolLM2-1.7B-Instruct'
|
use_gpu
|
bool
|
Whether to use GPU or not. |
False
|
use_quant
|
bool
|
Whether to use quantization or not. |
False
|
Examples:
>>> from anonipy.anonymize.generators import LLMLabelGenerator
>>> generator = LLMLabelGenerator()
LLMLabelGenerator()
Source code in anonipy/anonymize/generators/llm_label_generator.py
generate(entity, *args, add_entity_attrs='', temperature=1.0, top_p=0.95, **kwargs)
Generate the substitute for the entity based on it's attributes.
Examples:
>>> from anonipy.anonymize.generators import LLMLabelGenerator
>>> generator = LLMLabelGenerator()
>>> generator.generate(entity)
label
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity
|
Entity
|
The entity to generate the label from. |
required |
add_entity_attrs
|
str
|
Additional entity attribute description to add to the generation. |
''
|
temperature
|
float
|
The temperature to use for the generation. |
1.0
|
top_p
|
float
|
The top p to use for the generation. |
0.95
|
Returns:
Type | Description |
---|---|
str
|
The generated entity label substitute. |
Source code in anonipy/anonymize/generators/llm_label_generator.py
_prepare_model_and_tokenizer(model_name, use_gpu, use_quant)
Prepares the model and tokenizer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
The name of the model to use. |
required |
Returns:
Type | Description |
---|---|
AutoModelForCausalLM
|
The huggingface model. |
AutoTokenizer
|
The huggingface tokenizer. |
Source code in anonipy/anonymize/generators/llm_label_generator.py
_load_model(model_name, device, dtype, use_quant, use_gpu)
Load the model with appropriate configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
The name of the model to use. |
required |
device
|
device
|
The device to use for the model. |
required |
dtype
|
dtype
|
The data type to use for the model. |
required |
use_quant
|
bool
|
Whether to use quantization or not. |
required |
use_gpu
|
bool
|
Whether to use GPU or not. |
required |
Returns:
Type | Description |
---|---|
AutoModelForCausalLM
|
The huggingface model. |
Source code in anonipy/anonymize/generators/llm_label_generator.py
_load_tokenizer(model_name)
Load the tokenizer with appropriate configuration.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
The name of the model to use. |
required |
Returns:
Type | Description |
---|---|
AutoTokenizer
|
The huggingface tokenizer. |
Source code in anonipy/anonymize/generators/llm_label_generator.py
_generate_response(message, temperature, top_p)
Generate the response from the LLM.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
message
|
List[dict]
|
The message to generate the response from. |
required |
temperature
|
float
|
The temperature to use for the generation. |
required |
top_p
|
float
|
The top p to use for the generation. |
required |
Returns:
Type | Description |
---|---|
str
|
The generated response. |
Source code in anonipy/anonymize/generators/llm_label_generator.py
anonipy.anonymize.generators.MaskLabelGenerator
Bases: GeneratorInterface
The class representing the mask label generator.
Examples:
>>> from anonipy.anonymize.generators import MaskLabelGenerator
>>> generator = MaskLabelGenerator(model_name, context_window=100, use_gpu=False)
>>> generator.generate(entity)
Attributes:
Name | Type | Description |
---|---|---|
pipeline |
Pipeline
|
The transformers pipeline used to generate the label substitutes. |
context_window |
int
|
The context window size to use to generate the label substitutes. |
mask_token |
str
|
The mask token to use to replace the masked words. |
Methods:
Name | Description |
---|---|
generate |
Generate the substitute for the entity based on it's location in the text. |
Source code in anonipy/anonymize/generators/mask_label_generator.py
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 |
|
__init__(*args, model_name='FacebookAI/xlm-roberta-large', use_gpu=False, context_window=100, **kwargs)
Initializes the mask label generator.
Examples:
>>> from anonipy.anonymize.generators import MaskLabelGenerator
>>> generator = MaskLabelGenerator(context_window=120, use_gpu=True)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
The name of the masking model to use. |
'FacebookAI/xlm-roberta-large'
|
use_gpu
|
bool
|
Whether to use GPU/CUDA, if available. |
False
|
context_window
|
int
|
The context window size. |
100
|
Source code in anonipy/anonymize/generators/mask_label_generator.py
generate(entity, text, *args, **kwargs)
Generate the substitute for the entity using the masking model.
Examples:
>>> from anonipy.anonymize.generators import MaskLabelGenerator
>>> generator = MaskLabelGenerator(context_window=120, use_gpu=True)
>>> generator.generate(entity, text)
label
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity
|
Entity
|
The entity used to generate the substitute. |
required |
text
|
str
|
The original text in which the entity is located; used to get the entity's context. |
required |
Returns:
Type | Description |
---|---|
str
|
The generated substitute text. |
Source code in anonipy/anonymize/generators/mask_label_generator.py
_prepare_model_and_tokenizer(model_name, use_gpu)
Prepares the model and tokenizer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
model_name
|
str
|
The name of the model to use. |
required |
use_gpu
|
bool
|
Whether to use GPU/CUDA, if available. |
required |
Returns:
Type | Description |
---|---|
AutoModelForMaskedLM
|
The huggingface model. |
AutoTokenizer
|
The huggingface tokenizer. |
Tuple[AutoModelForMaskedLM, AutoTokenizer]
|
The device to use. |
Source code in anonipy/anonymize/generators/mask_label_generator.py
_create_masks(entity)
Creates the masks for the provided entity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity
|
Entity
|
The entity to create the masks for. |
required |
Returns:
Type | Description |
---|---|
List[dict]
|
The list of masks attributes, including the true text, mask text, start index, and end index within the original text. |
Source code in anonipy/anonymize/generators/mask_label_generator.py
_get_context_text(text, start_index, end_index)
Get the context text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text
|
str
|
The text to get the context from. |
required |
start_index
|
int
|
The start index of the context window. |
required |
end_index
|
int
|
The end index of the context window. |
required |
Returns:
Type | Description |
---|---|
str
|
The context window text. |
Source code in anonipy/anonymize/generators/mask_label_generator.py
_prepare_generate_inputs(masks, text)
Prepares the generate inputs.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
masks
|
List[dict]
|
The list of masks attributes. |
required |
text
|
str
|
The text to prepare the generate inputs for. |
required |
Returns:
Type | Description |
---|---|
List[str]
|
The list of generate inputs. |
Source code in anonipy/anonymize/generators/mask_label_generator.py
_create_substitute(entity, masks, suggestions)
Create a substitute for the entity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity
|
Entity
|
The entity to create the substitute for. |
required |
masks
|
List[dict]
|
The list of masks attributes. |
required |
suggestions
|
List[dict]
|
The list of substitute suggestions. |
required |
Returns:
Type | Description |
---|---|
str
|
The created and selected substitute text. |
Source code in anonipy/anonymize/generators/mask_label_generator.py
anonipy.anonymize.generators.NumberGenerator
Bases: GeneratorInterface
The class representing the number generator.
Examples:
>>> from anonipy.anonymize.generators import NumberGenerator
>>> generator = NumberGenerator()
>>> generator.generate(entity)
Methods:
Name | Description |
---|---|
generate |
Generates a substitute for the numeric entity. |
Source code in anonipy/anonymize/generators/number_generator.py
__init__(*args, **kwargs)
Initializes the number generator.
Examples:
Source code in anonipy/anonymize/generators/number_generator.py
generate(entity, *args, **kwargs)
Generates the substitute for the numeric entity.
Examples:
>>> from anonipy.anonymize.generators import NumberGenerator
>>> generator = NumberGenerator()
>>> generator.generate(entity)
"1234567890"
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity
|
Entity
|
The numeric entity to generate the numeric substitute. |
required |
Returns:
Type | Description |
---|---|
str
|
The generated numeric substitute. |
Raises:
Type | Description |
---|---|
ValueError
|
If the entity type is not |
Source code in anonipy/anonymize/generators/number_generator.py
anonipy.anonymize.generators.DateGenerator
Bases: GeneratorInterface
The class representing the date generator.
Examples:
>>> from anonipy.anonymize.generators import DateGenerator
>>> generator = DateGenerator(lang="de")
>>> generator.generate(entity)
Attributes:
Name | Type | Description |
---|---|---|
lang |
(str, LANGUAGES)
|
The language of the text. |
date_format |
str
|
The date format in which the date should be generated. |
day_sigma |
int
|
The range of the random date in days. |
Methods:
Name | Description |
---|---|
generate |
Generate the date substitute based on the input parameters. |
Source code in anonipy/anonymize/generators/date_generator.py
107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
|
__init__(*args, lang='en', date_format='auto', day_sigma=30, **kwargs)
Initializes the date generator.
Examples:
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lang
|
Union[str, LANGUAGES]
|
The language of the text. |
'en'
|
date_format
|
str
|
The date format in which the date should be generated. More on date formats see here. |
'auto'
|
day_sigma
|
int
|
The range of the random date in days. |
30
|
Source code in anonipy/anonymize/generators/date_generator.py
generate(entity, *args, sub_variant=DATE_TRANSFORM_VARIANTS.RANDOM, **kwargs)
Generate the entity substitute based on the input parameters.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
entity
|
Entity
|
The entity to generate the date substitute from. |
required |
sub_variant
|
DATE_TRANSFORM_VARIANTS
|
The substitute function variant to use. |
RANDOM
|
Returns:
Type | Description |
---|---|
str
|
The generated date substitute. |
Raises:
Type | Description |
---|---|
ValueError
|
If the entity type is not |
Source code in anonipy/anonymize/generators/date_generator.py
anonipy.anonymize.generators.GeneratorInterface
The class representing the generator interface.
All generators should inherit from this class.
Methods:
Name | Description |
---|---|
generate |
Generate a substitute for the entity. |