Voice Beat Counter
In dance training, keeping a consistent rhythm and counting the beats correctly is essential. However, many dancers struggle to count while focusing on their movements, and teachers often need tools that help students stay synchronized with the music.
This project is a Python-based audio tool that automatically adds a spoken counting voice over any music track. The program analyzes the rhythm of the song, detects the beats, and generates a clear voice that counts the measures (for example: 1 to 8 for ballet, 1 to 3 for waltz, or 1 to 4 for tango).
The main goal of this project is to help dancers practice more efficiently by providing an automatic voice guide that stays synchronized with the music. This system can be adapted for different dance styles and tempos, making it a flexible and practical learning tool.
Supplies
To build and run this project, you will need the following:
Hardware
- A computer (Windows, macOS, or Linux)
- Speakers or headphones
Software
- Python 3.8 or higher
Python Libraries
You need to install the following libraries:
- librosa – for beat detection and audio analysis
- numpy – for numerical processing
- pydub – for audio editing and mixing
- pyttsx3 – for text-to-speech voice generation
Optional
- Any .wav music file to test the project
- A code editor such as VS Code, PyCharm, or any text editor
Installing the Libraries
Before running the project, you need to install the required Python libraries.
Open a terminal or command prompt and run the following command:
Additional requirement (Windows users)
Pydub needs FFmpeg to work correctly. You may need to install FFmpeg and add it to your system PATH.
Once the installation is complete, you are ready to move to the next step.
Generating the Voice Numbers
This project uses a function called generar_numero_wav() to create spoken numbers as audio files.
What this function does
This function:
- Converts a number into a human voice
- Saves that voice as a .wav audio file
- Optionally makes the number “1” louder to mark the start of a measure
The code
How it works :
- It initializes the text-to-speech engine.
- It sets the voice speed with:
- If the number is 1 and the accent is enabled, it says "UNO" instead of "1".
- It creates a temporary .wav file.
- It saves the spoken number in that file.
- If the number is 1, it increases the volume.
- Finally, it returns the path to the generated audio file.
This function is the base for creating the counting voice in the project.
Generating the Voice on the Music Beats
This step explains the function generar_voz_compases(), which places the spoken numbers in the correct moments of the music.
What this function does
This function:
- Receives the detected beat times of the music
- Divides the song into measures (bars)
- Plays a spoken number only on odd-numbered measures
- Repeats the count from 1 up to a maximum number, depending on the dance style
The code
How it works
- It starts with a counter called compas_count that begins at 1.
- It loops through the music in steps of pasos_por_compas.
- It calculates the current measure number.
- If the measure is odd, it:
- Generates a spoken number as a WAV file.
- Loads the audio into memory.
- Deletes the temporary file.
- It calculates the exact time in milliseconds where the voice should play.
- It saves the voice track and its position in a list.
- It increases the counter and resets it when it reaches the maximum number.
This creates a list of voice sounds perfectly placed on top of the music.
Changing Speed
This step explains the function change_speed(), which depending on the numbre received, the music and counting will sound faster or slower
What this part does
This part of the code:
- Loads the original audio file
- Changes the playback speed if needed
- Places the voice numbers at the correct times
- Ensures that the voices do not overlap
- Exports the final result as a new .wav file
The code
How it works
- If the user asks for a speed of 1.0, that means normal speed, so it simply returns the original audio.
- It calculates a new frame rate
- If speed > 1 → the audio plays faster
- If speed < 1 → the audio plays slower
- It creates a new audio object with the modified frame rate
- _spawn() avoids re-encoding the audio.
- It keeps the same raw audio data
- It resets the frame rate back to the original
- Preserves the speed change
- Restores the original pitch
- Ensures compatibility with the rest of the audio processing
This makes it ideal for adjusting music or voice tracks while maintaining natural sound quality.
Mixing the Voice With the Music
In this step, the program combines the original music track with the generated spoken counting. Using the function generador_pista_con_conteo()
What this part does
This part of the code:
- Loads the original audio file
- Changes the playback speed if needed
- Places the voice numbers at the correct times
- Ensures that the voices do not overlap
- Exports the final result as a new .wav file
The code
How it works
- The music file is loaded:
- The program adjusts the speed if needed.
- It loops through every generated voice:
- Calculates its position on the timeline
- Moves it if it would overlap with a previous voice
- The voice is mixed into the music using overlay().
- The final track is exported as a .wav file.
This step creates a synchronized music track with a spoken counting voice.
What Else You Need to Put in the Code
You have to decsribe the type of dances you are coing to use, and the speed needed, and to make it synchronized an OFFSET at the beggining of the code:
You need to have some wav with the music on the same folder, so the code can read it and use it and this is the main you can use:
And with that you can have your music with the beat counting as help.