The school graduation ceremony started at 6 PM. Ninety minutes, fast Spanish, a lot of people talking — me sitting in the third row with my phone on my knee, quietly recording.
I understood the general mood: proud parents, a few tears, the teacher's speech about the class. But the specifics? The dates mentioned for summer programs, what the principal said about next year, the name of the award winner and why they won — gone.
This happens a lot when you live in a country where you don't fully speak the language. You're present for the event, but you leave without the details.
The file sitting on my phone
I had a 47-minute recording. Clear enough audio — a bit distant, but the voices came through. The problem: no way to quickly find the parts I'd missed without listening to the whole thing again, guessing at the fast parts, losing the names.
What I needed wasn't a translator sitting next to me. I'd needed that two hours ago. What I needed now was a document: what was said, by whom, in a language I actually read fluently.
Uploading it
I uploaded the file to Voiz. No account setup, no language settings to configure — dropped in the file, entered my email address. Three minutes later, I had a link to the result.
The transcript came back in Spanish (the original language) with an English translation alongside, and something I hadn't expected to value as much as I did: speaker labels. Each time a new person started talking, it was marked — Speaker 1, Speaker 2, Speaker 3. Suddenly I could see exactly when the principal had spoken versus the class teacher versus a student reading a poem.
What the output looked like
Speaker 1 [00:04:12]
Original: "Este año ha sido especial para todos nosotros..."
Translation: "This year has been special for all of us..."
Speaker 2 [00:06:55]
Original: "Los programas de verano comenzarán el 15 de julio..."
Translation: "Summer programs will start on July 15th..."
Not perfect — names occasionally got garbled, and quality dipped during applause — but the substance was all there. I spent twenty minutes reading instead of ninety guessing.
How to do this yourself
- Record the event on your phone — any voice recorder app works, any format (MP3, M4A, MP4, WAV)
- Upload the file below
- Enter your email — the transcript link goes there
- Open the result and read
The free tier covers up to 30 minutes. Longer recordings need a credit pack.
FAQ
- My recording was made from across a room — will it still work?
- Generally yes. Voiz uses Gemini AI, which handles moderate background noise and distance reasonably well. A typical room recording from a few meters away usually transcribes cleanly. Very loud background music or a heavily muffled microphone will reduce accuracy.
- Can I upload a recording from my phone?
- Yes — any standard audio or video format works: MP3, M4A, MP4, MOV, WAV, OGG. Upload directly from your phone's camera roll or voice memos app.
- What if people in the recording switch between two languages?
- Voiz handles code-switching — when speakers mix languages within the same conversation. Each segment is transcribed in the original language and translated, so nothing is lost regardless of which language it was said in.