Development

Setup

With Pyenv

git clone /~https://github.com/KoelLabs/ML.git
Install Python 3.8.10
- Install pyenv
- Run pyenv install 3.8.10
- Pyenv should automatically use this version in this directory. If not, run pyenv local 3.8.10
Create a virtual environment
- Run python -m venv ./venv to create it
- Run . venv/bin/activate when you want to activate it
  - Run deactivate when you want to deactivate it
- Pro-tip: select the virtual environment in your IDE, e.g. in VSCode, click the Python version in the bottom left corner and select the virtual environment
Duplicate the .env.example file and rename it to .env. Fill in the necessary environment variables.
Run the commands in './scripts/install.sh', e.g., with . ./scripts/install.sh.
- This will install dependencies. You should always activate your virtual environment . ./venv/bin/activate before running any scripts.

With Conda

git clone /~https://github.com/KoelLabs/ML.git
Install miniconda or anaconda
- Install miniconda
- Or install anaconda
Create a virtual environment
- Run conda create --prefix ./venv python=3.8.10 to create it
- Run conda activate ./venv when you want to activate it
  - Run conda deactivate when you want to deactivate it
- Pro-tip: select the virtual environment in your IDE, e.g. in VSCode, click the Python version in the bottom left corner and select the virtual environment
Duplicate the .env.example file and rename it to .env. Fill in the necessary environment variables.
Run the commands in './scripts/install.sh', e.g., with . ./scripts/install.sh.
- This will install dependencies. You should always activate your virtual environment conda activate ./venv before running any scripts.

Useful Commands

pip freeze > requirements.txt - Save the current environment to a requirements file
pip install -r requirements.txt - Install the requirements from a file
python ./scripts/audio.py record ./data/test.wav - Record audio to a file
python ./scripts/audio.py play ./data/alexIsConfused.wav - Play audio from a file
python ./scripts/audio.py convert ./data/openai_tts.mp3 ./data/openai_tts.wav - Convert audio from one format to another
python ./scripts/audio.py text "hello there" ./data/hello_tts.wav - Synthesize audio from text for testing

Formatting, Linting, Automated Tests and Secret Scanning

All checks are run as github actions when you push code. You can also run them manually with . scripts/alltests.sh.

We use Black for formatting. It is recommended you integrate it with your IDE to run on save. You can run it manually with black .. We do not enforce these styles for notebooks.
We scan the repo for leaked secrets with gitleaks. You can run it manually with gitleaks detect.
We use zizmor for static analysis and security audits of github actions. You can run it manually with zizmor ..

Directory Structure

ML/
├── .github/                     # GitHub actions and issue templates
├── data/                        # Small samples of test data
├── guides/                      # Finetuning, evaluation, and other guides (these should be well-documented and run standalone)
├── notebooks/                   # Interactive python notebooks (these are more for exploration and not necessarily well-documented)
├── .data/                       # Large datasets and other hidden data
├── models/                      # Trained models organized in subfolders by third-party source
├── repos/                       # Git submodules for third-party repositories
├── scripts/                     # Shell+Python scripts
│   ├── asr/                     # Test scripts for automatic speech recognition
│   ├── eval_tests/              # Test scripts for evaluation metrics
│   ├── ipa_transcription/       # Test scripts for IPA transcription
│   ├── ipa_synthesis/           # Test scripts for IPA synthesis
│   ├── translitlat/             # Test scripts for transliteration and translation
│   ├── intonation_labeling/     # Test scripts for intonation labeling (ToBI, etc.)
│   ├── stress_detection/        # Test scripts for stress detection
│   ├── cadence_analysis/        # Test scripts for cadence analysis
│   ├── tonal_labeling/          # Test scripts for tonal labeling (e.g. Mandarin tones)
│   ├── forced_alignment/        # Test scripts for forced alignment
│   ├── voice_cloning/           # Test scripts for voice cloning
│   ├── ipa.py                   # Utils for IPA conversion
│   ├── audio.py                 # Utils for converting audio formats, recording, and playing audio
│   └── install.sh               # Setup commands            
├── .env.example                 # Example environment variables
├── .gitignore                   # Git ignore rules
├── CONTRIBUTING.md              # Contributing guidelines
├── DEVELOPMENT.md               # Development setup instructions
├── LICENSE                      # License information
├── README.md                    # Readme
└── requirements.txt             # Python dependencies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DEVELOPMENT.md

DEVELOPMENT.md

Development

Setup

With Pyenv

With Conda

Useful Commands

Formatting, Linting, Automated Tests and Secret Scanning

Directory Structure

Files

DEVELOPMENT.md

Latest commit

History

DEVELOPMENT.md

File metadata and controls

Development

Setup

With Pyenv

With Conda

Useful Commands

Formatting, Linting, Automated Tests and Secret Scanning

Directory Structure