This is a simple Python application powered by AI and designed to help you improve your English pronunciation by mimicking native speakers. The app runs locally, so your data never leaves your machine.
The core feature is a pronunciation comparison between your speech and the reference pronunciation from either an AI assistant or a source audio file. The similarity is calculated using the Word Error Rate (WER), which measures how close your pronunciation is to the reference.
Range | Description | Quality |
---|---|---|
0 - 20% | Excellent | Very close to native pronunciation. |
20 - 40% | Good | Some minor mispronunciations, but generally understandable. |
40 - 60% | Fair | Noticeable errors; pronunciation may sound foreign but is mostly understandable. |
60% and above | Poor | Significant pronunciation issues, making it difficult to understand. |
- Transcription: Wav2Vec2 model.
- Text-to-Speech: Coqui.AI TTS.
- English Phonemes: CMU Pronouncing Dictionary.
- Data: Your performance data is stored in SQLite3 databases, with separate databases for each mode. Over time, this setup will allow you to query and analyze your progress effectively.
The app was developed using Python 3.11.2 on Debian GNU/Linux 12 (Bookworm). If you're using a different operating system, you might need to make some adjustments to suit your environment.
Clone this repo and you'll can set up the app locally by following these commands:
python -m venv venv
. venv/bin/activate
pip install -r requirements.txt
Install youtube-dl, if you wanna use youtube videos. You can use these commands:
sudo curl -L /~https://github.com/ytdl-org/ytdl-nightly/releases/download/2024.07.07/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
To personalize the setup, you'll need to configure a few variables in the settings.py file.
Ensure the scripts are executable by running:
chmod +x <script>
You can provide input in one of two ways:
- Text File: A file with one phrase per line.
- Audio Folder: A folder containing *.wav audio files.
The ./download_and_split_ytb <textfile.txt> script allows you to download audio from YouTube videos. It will split the audio into 3-second segments, which you can directly edit in the script. The text file should contain URLs to YouTube videos, one per line.
The text file should contain one phrase per line. For example:
How are you today?
I am learning to speak English.
To run the app, use the following command in any shell:
python main.py --mode <audio | text>
If you encounter any issues, please check the following:
- Ensure the necessary dependencies are installed.
- Verify your configuration in settings.py.
- Make sure the scripts have the appropriate executable permissions.