In my opinion, one of the best use cases of AI is audio transcription. As a wordsmith by nature, I'm frequently disappointed by generative AI, but I find AI inference extremely useful. I consider it the missing component between the input you provide and the input you actually mean to provide. This is useful for speech recordings, where background noise, microphone dynamics, or poor compression can distort words. AI inference is able to infer the most probable meaning of what otherwise is difficult to hear.
However, audio transcription as a service presents a potential privacy risk since it requires sending your audio file to an external server. I will demonstrate how, with just a few Python and Git commands, you can easily run a local audio transcription application, powered by an open source training model from Red Hat AI. Once installed, you can use it without an Internet connection because it's entirely local, and your audio never leaves your computer.
How to set up and run the application
Open a terminal, and follow along.
First install uv. The uv application is a Python package manager like the Python pip module, but with many additional features.
Download the install script as follows:
$ curl -LsSf https://astral.sh/uv/install.sh -OReview the script and then run it:
$ bash ./install.shCreate a virtual environment
Create a Python virtual environment for your work.
$ uv venv --seed whisper-exampleNext install the whisper application:
$ uv pip install whisperInstall the HuggingFace repository tool to make it easy to obtain new AI models.
uv tool install hfDownload the model
Red Hat has tested and validated the RedHatAI/whisper-large-v3-turbo-FP8-dynamic model for performance and accuracy. This is one of the models you can run on the Red Hat AI Inference Server, which provides a supported open source solution that allows you to deploy your AI models on a variety of hardware and AI accelerators to match your specific infrastructure needs.
You can download the model from its HuggingFace repository using the hf tool:
$ hf download RedHatAI/whisper-large-v3-turbo-FP8-dynamicThe model size is about 1 GB. When the download is complete, you will receive the location of the model as follows:
Download complete: : 0.00B [00:00, ?B/s]
/home/tux/.cache/huggingface/hub/models--RedHatAI--whisper-large-v3-turbo-FP8-dynamic/snapshots/e72a6dca29d039a5c9ea13e622e496ca61e85c34Take note of the model location for the next step.
Transcribe the audio
Now that you have installed Whisper and a Red Hat AI model, you can transcribe an audio file. Keep in mind that you must activate your Python virtual environment before using this install of Whisper. Unless you've closed your terminal window during this install and setup process, the virtual environment is still active.
Assuming you have an audio recording called example.flac, you can transcribe it by providing Whisper the base path to the Red Hat model and the audio file.
$ whisper --model_dir ~/.cache/huggingface/hub/models--RedHatAI--whisper-large-v3-turbo-FP8-dynamic ~/example.flacExample output:
Detecting language using up to the first 30 seconds. Use `--language` to specify the language
Detected language: English
[00:00.000 --> 00:10.200] This is a test of the Whisper and Red Hat AI combination running on my Red Hat Enterprise Linux laptop.Next steps
Open source AI allows you to keep your data and computing local. Using familiar tools on Red Hat Enterprise Linux and Fedora Linux, you can implement your own in-house audio transcription service. If you're a Python programmer, you can even use the Red Hat models with your applications.
Visit Red Hat AI on HuggingFace for more information about the available models. Check out the Red Hat AI Inference Server page to learn how you can deploy AI-powered applications.