Test Prompt Generator

Create prompts with a given token length for testing LLMs and other transformers text models.

Pre-created prompts for popular model architectures are provided in .jsonl files in the prompts directory.

To generate one or a few prompts, or to test the functionality, you can use the Test Prompt Generator Space on Hugging Face.

Install

pip install git+/~https://github.com/helena-intel/test-prompt-generator.git transformers

Some tokenizers may require additional dependencies. For example, sentencepiece or protobuf.

Usage

Specify a tokenizer, and the number of tokens the prompt should have. A prompt will be returned that, when tokenized with the given tokenizer, contains the requested number of tokens.

For tokenizer, use a model_id from the Hugging Face hub, a path to a local file, or one of the preset tokenizers: ['bert', 'blenderbot', 'bloom', 'bloomz', 'chatglm3', 'falcon', 'gemma', 'gpt-neox', 'llama', 'magicoder', 'mistral', 'mpt', 'opt', 'phi-2', 'pythia', 'qwen', 'redpajama', 'roberta', 'starcoder', 't5', 'vicuna', 'zephyr']. The preset tokenizers should work for most models with that architecture, but if you want to be sure, use an exact model_id. This list shows the exact tokenizers used for the presets.

Prompts are generated by truncating a given source text at the provided number of tokens. By default Alice in Wonderland is used; you can also provide your own source. A prefix can optionally be prepended to the text, to create prompts like "Please summarize the following text: [text]". The prompts are returned by the function/command line app, and can also optionally be saved to a .jsonl file.

Python API

Basic usage

from test_prompt_generator import generate_prompt

# use preset value for opt tokenizer
prompt = generate_prompt(tokenizer_id="opt", num_tokens=32)
# use model_id
prompt = generate_prompt(tokenizer_id="facebook/opt-2.7b", num_tokens=32)

Slightly less basic usage

Add a source_text_file and prefix. Instead of source_text_file, you can also pass source_text containing a string with the source text.

from test_prompt_generator import generate_prompt

prompt = generate_prompt(
    tokenizer_id="mistral",
    num_tokens=32,
    source_text_file="source.txt",
    prefix="Please translate to Dutch:",
    output_file="prompt_32.jsonl",
)

Use multiple token sizes. When using multiple token sizes, output_file is required, and the generate_prompt function does not return anything. The output_file will contain one line for each token size.

prompt = generate_prompt(
    tokenizer_id="mistral",
    num_tokens=[32,64,128],
    output_file="prompts.jsonl",
)

NOTE: When specifing one token size, the prompt will be returned as string, making it easy to copy and use in a test scenario where you need one prompt. When specifying multiple token sizes a dictionary with the prompts will be returned. The output file is always in .jsonl format, regardless of the number of generated prompts.

Command Line App

test-prompt-generator -t mistral -n 32

Use test-prompt-generator --help to see all options:

usage: test-prompt-generator [-h] -t TOKENIZER -n NUM_TOKENS [-p PREFIX] [-o OUTPUT_FILE] [--overwrite] [-v] [-f FILE]

options:
  -h, --help            show this help message and exit
  -t TOKENIZER, --tokenizer TOKENIZER
                        preset tokenizer id, model_id from Hugging Face hub, or path to local directory with tokenizer files. Options for presets are: ['bert', 'bloom', 'gemma', 'chatglm3', 'falcon', 'gpt-neox',
                        'llama', 'magicoder', 'mistral', 'opt', 'phi-2', 'pythia', 'roberta', 'qwen', 'starcoder', 't5']
  -n NUM_TOKENS, --num_tokens NUM_TOKENS
                        Number of tokens the generated prompt should have. To specify multiple token sizes, use e.g. `-n 16 32`
  -p PREFIX, --prefix PREFIX
                        Optional: prefix that the prompt should start with. Example: 'Translate to Dutch:'
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        Optional: Path to store the prompt as .jsonl file
  --overwrite           Overwrite output_file if it already exists.
  -v, --verbose
  -f FILE, --file FILE  Optional: path to text file to generate prompts from. Default text_files/alice.txt

Disclaimer

This software is provided "as is" and for testing purposes only. The author makes no warranties, express or implied, regarding the software's operation, accuracy, or reliability.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
prompts		prompts
test_prompt_generator		test_prompt_generator
tests		tests
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Test Prompt Generator

Install

Usage

Python API

Basic usage

Slightly less basic usage

Command Line App

Disclaimer

About

Releases

Packages

Languages

helena-intel/test-prompt-generator

Folders and files

Latest commit

History

Repository files navigation

Test Prompt Generator

Install

Usage

Python API

Basic usage

Slightly less basic usage

Command Line App

Disclaimer

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages