Wispr Flow: The Future of Voice-Activated AI Transcription

Victor Tan
10 min readOct 3, 2024

--

In the past couple of decades of human history, I can remember the seminal inventions that shaped our human existence so profoundly that somehow or another, whether we realized it or not, our lives had changed.

Of these inventions, the most immediate that I can point to is Google, the search engine that made it so we could see the entire world. Beyond that, I’d say Facebook, the social media app that connected the world in a strange technological network.

The next one of these and probably freshest in people’s memories is ChatGPT, the tool that showed us the power and usefulness of generative AI, highlighting for us both the revolution of this new technology and also heightening our fears that one day robots would take over all of us.

Well, I firmly believe that the next one is here, and its name is Wispr Flow.

Download it here!

(Unfortunately, it’s Mac Silicon only at the moment. Sorry if you’re out there using Windows, guys.)

But what exactly is Wispr, and why are you asking me to download this?

Well, I’m glad you asked.

What is Wispr?

Wispr is an AI transcription software, but it is not just any transcription software.

It’s a transcription software that activates at the touch of a button.

You can use it in any text field and begin transcribing what you are saying by nothing more than a touch of a button and then speaking into your microphone, which ends up creating transcriptions like this, and even intelligently paragraph what you are saying while at the same time minimizing redundancy by fixing mistakes for you on the fly, based on your writing style, yielding transcriptions like this.

Seen: Flow in action.

What does it cost?

The software itself is free to use for up to 2,000 words in the course of a single week if you choose to use the Flow Basic plan.

On the other hand, if you use Flow Pro, which most of you probably will. that’s going to cost $12 a month and it’s going to get you unlimited words and access to a couple of cool features such as:

1. Command Mode: This mode allows you to use ChatGPT in any text field to edit and format text. It also enables you to utilize the AI's capabilities to generate output and edit text with ease.

2. Perplexity integration: This feature is an additional component of the AI's capabilities, which can be utilized in conjunction with Command Mode. It can be used to further enhance the output and editing capabilities of the AI.

I will cover these in more posts at a later point. Let’s get into the meat of things and talk about the killer feature here: seamless voice transcription!

How does it work?

Activating Flow is literally just the touch of a button that results in the entire transcription process beginning, processing, and eventually concluding within no more than a few seconds.

In this small example, I’m just using the Option key as a customized hotkey.

Whenever I want to activate Flow, I press the Option button twice, and then begin speaking on my computer, then this guy pops out.

Immediately, within just a couple of seconds, an entirely formatted paragraph comes out.

Pretty cool, isn’t it?

But doesn’t ChatGPT allow you to do the same thing?

Some of you might very well say, doesn’t ChatGPT allow you to do the exact same thing anyway?

Moreover, isn’t it true that you also don’t have to pay?

You’re absolutely right. You can definitely do Whisper transcription inside ChatGPT just by tapping the little button to the right of your text box in the ChatGPT app.

The button is highlighted Yellow above.

It’s very quick, very efficient, and it allows you to capitalize on the massive speed advantages that voice typing allows you to obtain.

However, it is still slower than Wispr Flow.

Let’s do a bit of a comparison then.

Wispr Flow vs ChatGPT for Transcriptions

Let’s consider this in terms of what actually needs to happen in order for you to use each of these different transcription methods. And let’s begin with ChatGPT.

ChatGPT voice transcription process

First, consider that you need to actually open ChatGPT in order to begin the transcription process.

This means that you need to switch tabs away from what you were actually doing, access ChatGPT, and then only begin this process, and it means you have to move away from the context to forget what you are saying and to start afresh or new and maybe even rearrange windows so that you can see what you’re looking at in order to begin the process of transcribing.

Here’s a look at ChatGPT being used to transcribe and create a message for sending.

As you can see, there were at least 8 steps in this process to get to the pasting stage and before sending, even though there are 5 images. To summarize…

Here is the ChatGPT process:

  1. I had to open up ChatGPT
  2. Then I had to push the record button
  3. I had to speak
  4. I had to copy the output
  5. I moved to the next app, WhatsApp.
  6. I tapped it.
  7. I tapped the text box where I was supposed to send what I wanted to send.
  8. I pasted it.

Compare this to Wispr Flow, where the process is much simpler:

1. I tapped the text box.

2. I tapped a key on my keyboard.

3. I spoke.

4. I saw that the output was complete and also formatted.

What does this mean?

As you can see, ChatGPT takes a minimum of 8 steps in order to get the same message out, while Wispr Flow takes only 4 and it even formats the output so that things such as lists, paragraphs, and other kinds of formatting are taken care of, while ChatGPT will not do that.

Let’s also remember that for Wispr Flow, all transcription takes place within the same context window, so you can refer to everything that was said before as you decide on what you’re going to say.

For ChatGPT, you have to open up multiple devices and then try to decide on what you’re going to say by memory and may very well lose the plot unless you’re constantly referring back to what it is that you are referencing before eventually going back to ChatGPT, whereas if you’re using Wispr Flow, you can just read what you’ve already written in order to decide what it is that you’re going to say next.

Put simply, ChatGPT is great at cutting down on typing time, but it is still much more inefficient compared to Wispr Flow.

To summarize:

Wispr Flow takes forward the culture of efficient and highly accurate voice transcription that was set forth by OpenAI’s Wispr algorithm and it improves upon it in every way.

It is quick, it is fast and works with the touch of a button, and it is the most seamless transcription experience I have ever had in my entire life; while ChatGPT may have saved a good chunk of time, Wispr Flow can help you save hours upon hours of editing and publishing work more, while at the same time cutting down on the transaction costs that take place whenever you switch between windows.

It cuts away every single bit of the fat and the adjustment processes associated with creating a text and having your voice translate itself into a text into, and that’s not even considering the way that it is constantlyaccurately spacing your words, minimizing edits, and even learning the way that you write so that you can create more efficient transcripts and also articles and publications along the way.

Installing Wispr Flow is simple: just go over to this link, click download on the upper right corner, and you’re on your way to ensuring that you can access one of the most incredible pieces of software in the entire world.

To conclude:

I’ll start off with an admission.

Every single part of this piece has been written with Wispr Flow by me just sitting down and writing everything into this document with nothing more than just a bit of conversation on my part with the device straight away, ensuring that my thoughts can be translated into the document.

It is deeply incredible and something that I have never imagined being able to do, and I truly believe that Wispr Flow is the service that will create the next vanguard for new interactivity with the computer.

I don’t think that Wispr Flow is going to entirely kill the keyboard for a variety of reasons.

Namely, for example, in a modern office environment (though without prejudice to the possibility that other kinds of office environments may evolve in the future where this point is not a concern), people can’t very well be talking to themselves constantly because that’s going to end up disrupting people, as well as the keyboard’s ability to adapt to different typing styles, its advanced macro capabilities, and its compatibility with various keyboard layouts; also, things like gaming and other applications may still recruit the capabilities of the keyboard along the way, and human society may end up adapting brand new technologies that may be more usable with keyboards than just the modality of voice interaction.

Still though…

For me, Wispr Flow is quite possibly the single most efficiency-raising software development that I have ever encountered.

I consider it a godsend for me and for anyone else out there who has carpal tunnel syndrome and faceless difficulty as a result of carpal tunnel and hours spent moving fingers one after another, opening up a route not only for navigating computers and the human-computer interactions that we take part in in a more healthful, ergonomic, and ultimately beneficial way that helps to reduce the strain that we place upon the human body.

A side benefit?

Wispr Flow and other forms of automated speech recognition (ASR) algorithms have allowed me to constantly practice my ability to articulate my thoughts and ideas in the realm of the spoken word; I consider it valuable for those of us out there who need to make ourselves heard by speaking, interacting, and articulating themselves to audiences — not only of readers but also to those who will eventually end up experiencing our presence in conversations or in auditoriums during the odd moments when we have to give speeches, and I am sure that it will be the same for many people out there as well.

I’d like to reiterate what I said at the start of this piece about seminal inventions.

These were inventions that were incredible and that fundamentally shaped the way that we existed, interacted, or connected with one another.

After having written this piece, I am all the more convinced. The ease at which my ideas came out, the speed at which it took place, the lack of focus on the mechanical aspects of what would otherwise be an arduous process predicated upon my manually looking for errors along the way has been transformed into a process that has allowed me to use my mind rather than deal with the intricacies of ensuring the physical accuracy of every single thing along the way.

In short, the entire paradigm of interaction and creation has for me transformed.

As I move forward to posting this, I will observe that it was the easiest blog post that I have ever written in my entire life, requiring no more than the 10 mouse clicks to copy, paste, and upload it along the way.

I truly cannot recall a more transformative technology throughout the course of my life and give my profound congratulations to the Wispr Flow team, including Tanay and also Sahaj, for having brought this incredible piece of work together. You have personally transformed my life, and I am confident that millions of people await this journey of transformation as well.

My heartiest congratulations to all of you for this world-shaking achievement, and I look forward to seeing your names further up in lights in the future.

Victor.

--

--

Victor Tan

Enjoys talking to smart people x Perpetually passionate about education in an age of AI and modern technology; Malaysia native, occasionally found elsewhere.