Building a Gemini AI Voice Assistant with ESP32 and OLED Display

Building a Gemini AI Voice Assistant with ESP32 and OLED Display Building a Gemini AI Voice Assistant with ESP32 and OLED Display

Overview

Ever dreamed of having your own personal AI assistant, like Jarvis from Iron Man, sitting on your desk? What if I told you that you could build one yourself using a handful of affordable components? It’s not science fiction; it’s a fantastic weekend project that combines the power of Google’s Gemini AI with the versatility of the ESP32 microcontroller.

In this guide, we’ll walk you through every single step of creating a voice-activated assistant that listens to your questions, sends them to the cloud for processing, and displays the intelligent response right on a tiny OLED screen. Forget just blinking LEDs; this is your gateway to creating truly interactive and smart IoT devices. Let’s dive in!


Required Components

First things first, let’s go shopping! Gathering your components is the most exciting part of any new project. Here’s a detailed breakdown of everything you’ll need to bring your very own AI assistant to life.

An ESP32 Development Board

This is the undisputed brain of our entire operation. The ESP32 isn’t just any microcontroller; it’s a powerhouse that’s also incredibly kind to your wallet. Its most important feature for this project is the built-in Wi-Fi, which is absolutely essential for connecting to the internet and talking to the Google Gemini API.

  • Why this one? It has the processing punch to handle audio and the wireless connection to get answers from the cloud.
  • What to look for: Any popular variant will do the trick perfectly. Look for names like ESP32 DevKitCESP32 WROOM-32, or any board from a reputable brand. They’re all fundamentally the same for our purposes.

An I2S Microphone (like the INMP441)

INMP441 I2S Omnidirectional Microphone

For our assistant to be smart, it first needs to be a good listener. That’s where an I2S microphone module comes in. You might be wondering, “Why not just use a simple analog microphone?” Great question! An I2S mic gives us a high-quality digital audio signal right off the bat.

  • Why this one? Unlike analog mics that can pick up electrical noise and need extra components, an I2S mic provides a clean, digital signal that’s much easier for the ESP32 to process and send to the AI. The INMP441 is a fantastic, reliable, and popular choice for this kind of project.
  • Good to know: This little module is what will capture the sound of your voice with clarity.

A 128×64 I2C OLED Display

How will our assistant talk back to you? While we won’t be adding a speaker in this initial build, it will “speak” by displaying text on a small OLED screen. This is where you’ll see Gemini’s clever answers to your questions.

  • Why this one? A 128×64 I2C OLED is the perfect size. The I2C part is a huge bonus for us, as it’s a communication protocol that only requires two data wires (SDA and SCL). This means our wiring stays clean, simple, and headache-free.
  • Good to know: These screens are bright, crisp, and come in both white and blue text variants. Either one will work great!

The Trigger: A Simple Push Button

How do you tell the assistant you’re about to ask a question? With a simple tactile push button! This will act as our “push-to-talk” trigger. You’ll hold it down to record your question and release it to get the answer.

  • Why this one? It’s the most basic and reliable form of input. No complex programming needed—just a simple press.
  • Good to know: We’ll be using the ESP32’s internal pull-up resistor, so you won’t even need any extra resistors for this part!

Breadboard and Jumper Wires

No project can stand without a framework. A breadboard and a pack of jumper wires are the essential skeleton for our circuit. They allow us to connect all our components together to test and prototype everything without needing to pick up a soldering iron.

Micro-USB Cable: To power and program the ESP32.


Circuit Diagram

Now for the fun part—connecting all the required components. A clear wiring diagram is your best friend here, check our below detailed circuit diagram and connect according to it. Follow these connections carefully.

Gemini AI Voice Assistant with ESP32 and OLED Display circuit diagram

Component Pin
ESP32 Pin
Description
OLED Display
VCC 3.3V Power for the display
GND GND Common Ground
SCL GPIO 22 I2C Clock Line
SDA GPIO 21 I2C Data Line
I2S Microphone
VCC 3.3V Power for the mic
GND GND Common Ground
SCK GPIO 33 I2S Serial Clock (BCLK)
WS GPIO 25 I2S Word Select (LRCL)
SD GPIO 32 I2S Serial Data
Push Button
One Leg GPIO 34 Button Input Pin
Other Leg GND Connect to ground to trigger on press

Double-check your connections before plugging in the USB cable! A single misplaced wire can cause frustrating issues.


From Breadboard Prototype to Polished AI Gadget: Your Next Step

It’s a magical moment when your AI assistant first responds on the breadboard. But to move from a tangle of wires to a sleek, finished gadget, you need to bridge the gap between prototype and polished product. To transform this powerful idea into a durable, polished gadget you can proudly show off, you need to bridge the gap between a brilliant idea and a finished product.

PCBWay: Your One-Stop Digital Manufacturing Partner

Transform your electronics projects from concept to reality with PCBWay’s comprehensive manufacturing services. Beyond high-quality PCB fabrication—including standard, advanced, flexible, and rigid-flex options—PCBWay offers end-to-end solutions for engineers and innovators worldwide.

Assembly Services streamline production with SMT, through-hole, and mixed assembly capabilities. Choose from turnkey, kitted, or consigned options, with component sourcing, BGA assembly, and functional testing available. Customize further with conformal coating, firmware loading, or box-build assembly.

Rapid Prototyping accelerates development via CNC machining (3-, 4-, and 5-axis milling and turning), 3D printing (FDM, SLA, SLS, MJF, DMLS, and PolyJet), and vacuum casting. Sheet metal fabrication supports laser cutting, bending, and post-processing, while injection molding enables rapid tooling and multi-cavity molds.

Enhanced Capabilities include SMT stencils, surface finishes (anodizing, bead blasting, spray painting), and specialized PCB options like HDI, aluminum, and high-frequency boards. Quality assurance is ensured through AOI, X-ray, impedance control, and UL certification.

PCBWay also provides free design tools: Gerber viewers, impedance calculators, and KiCad plugins simplify project preparation. With global support, competitive pricing, and scalable solutions for prototypes to mass production, PCBWay empowers creators to innovate faster.

Explore PCBWay’s full suite of services at pcbway.com and bring your AI assistant—and your next big idea—to life.

From NOV 28th to DEC 31st there are huge offers going on for Christmas, where you can get upto 50% OFF, there are special offers like 10% OFF on 3D printing materials + 20% OFF starting price, Also find free upgrades on solder masks and more. For more details and Offers page please check: https://www.pcbway.com/activity/christmas2025.html


Code and Upload program to ESP32

After connecting all the components, its time to upload the code. Copy below code and change WiFi settings, API keys and install required libraries and upload it by following below steps.

Install Libraries: You need to install a few libraries through the Arduino IDE’s Library Manager (Tools > Manage Libraries...).

  • ArduinoJson (by Benoit Blanchon)
  • Adafruit GFX Library
  • Adafruit SSD1306
  • base64 (by Densaugeo) – This is important for encoding the audio data.
  • The WiFi and HTTPClient i2slibraries are usually included with the ESP32 board package you just installed.

Getting Your Google Gemini API Key: This is the most crucial step. The API key is like a secret password that authenticates your project with Google’s servers.

  1. Navigate to the Google AI Studio.
  2. Sign in with your Google account.
  3. Click on “Create API Key”.
  4. You might be asked to select a Google Cloud project. Just choose the default one.
  5. Your API key will be generated! Copy it and keep it safe. Treat it like a password; don’t share it publicly or commit it to a public GitHub repository.

Upload the code

  1. Connect your ESP32 to your computer via the USB cable.
  2. In the Arduino IDE, go to Tools > Board and ensure your ESP32 model is selected.
  3. Go to Tools > Port and select the COM port your ESP32 is connected to.
  4. Click the Upload button (the right-pointing arrow).
  5. Once the code is uploaded, open the Serial Monitor (Tools > Serial Monitor) and set the baud rate to 115200. You should see the Wi-Fi connection status and a “Ready” message.
// ===========================================================
// == Gemini AI Voice Assistant with ESP32 and OLED Display ==
// == Code produced by Circuitschools.com Visit for more    ==
// ===========================================================
#include <WiFi.h>
#include <HTTPClient.h>
#include <ArduinoJson.h>
#include <driver/i2s.h>
#include <Adafruit_GFX.h>
#include <Adafruit_SSD1306.h>
#include <base64.h> // Install "base64" library by Densaugeo

const char* ssid = "YOUR_WIFI_SSID";

// Your Wi-Fi Password
const char* password = "YOUR_WIFI_PASSWORD";

// Your Google Gemini API Key
const char* apiKey = "YOUR_GEMINI_API_KEY";

// ==============================
// == PIN CONFIGURATION ==
// ==============================
const int BUTTON_PIN = 34;

// I2S pins for INMP441 microphone
const int I2S_WS_PIN = 25;
const int I2S_SCK_PIN = 33;
const int I2S_SD_PIN = 32;

// ==============================
// == OLED DISPLAY CONFIGURATION ==
// ==============================
#define SCREEN_WIDTH 128
#define SCREEN_HEIGHT 64
#define OLED_RESET -1 // Reset pin (-1 if sharing Arduino reset pin)
Adafruit_SSD1306 display(SCREEN_WIDTH, SCREEN_HEIGHT, &Wire, OLED_RESET);

// ==============================
// == I2S MICROPHONE CONFIGURATION ==
// ==============================
i2s_config_t i2s_config = {
  .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
  .sample_rate = 16000, // 16kHz sample rate
  .bits_per_sample = I2S_BITS_PER_SAMPLE_32BIT, // INMP441 provides 32-bit samples
  .channel_format = I2S_CHANNEL_FMT_ONLY_LEFT, // Use only the left channel
  .communication_format = I2S_COMM_FORMAT_STAND_I2S,
  .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
  .dma_buf_count = 8,
  .dma_buf_len = 64,
  .use_apll = false,
  .tx_desc_auto_clear = false,
  .fixed_mclk = 0
};

i2s_pin_config_t pin_config = {
  .bck_io_num = I2S_SCK_PIN,
  .ws_io_num = I2S_WS_PIN,
  .data_out_num = I2S_PIN_NO_CHANGE, // Not used
  .data_in_num = I2S_SD_PIN
};

// ==============================
// == SETUP FUNCTION ==
// ==============================
void setup() {
  Serial.begin(115200);
  while (!Serial); // Wait for serial monitor to open

  // Initialize OLED Display
  if (!display.begin(SSD1306_SWITCHCAPVCC, 0x3C)) {
    Serial.println(F("SSD1306 allocation failed"));
    for (;;); // Don't proceed, loop forever
  }
  display.clearDisplay();
  display.setTextSize(1);
  display.setTextColor(SSD1306_WHITE);
  display.setCursor(0, 0);
  display.println("Gemini Assistant");
  display.println("Connecting WiFi...");
  display.display();

  // Initialize I2S
  i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);
  i2s_set_pin(I2S_NUM_0, &pin_config);
  i2s_set_clk(I2S_NUM_0, i2s_config.sample_rate, i2s_config.bits_per_sample, I2S_CHANNEL_MONO);

  // Connect to Wi-Fi
  Serial.print("Connecting to WiFi...");
  WiFi.begin(ssid, password);
  while (WiFi.status() != WL_CONNECTED) {
    delay(500);
    Serial.print(".");
  }
  Serial.println("\nWiFi connected!");
  Serial.print("IP Address: ");
  Serial.println(WiFi.localIP());

  display.clearDisplay();
  display.setCursor(0, 0);
  display.println("WiFi Connected!");
  display.println("Press button to talk.");
  display.display();

  // Set button pin mode with internal pull-up resistor
  pinMode(BUTTON_PIN, INPUT_PULLUP);
}

// ==============================
// == MAIN LOOP FUNCTION ==
// ==============================
void loop() {
  // Check if the button is pressed (LOW because of INPUT_PULLUP)
  if (digitalRead(BUTTON_PIN) == LOW) {
    Serial.println("Button pressed, recording...");
    display.clearDisplay();
    display.setCursor(0, 0);
    display.println("Listening...");
    display.display();

    // Record audio and get it as a Base64 encoded string
    String audioBase64 = recordAudio();

    if (audioBase64.length() > 0) {
      Serial.println("Recording finished. Sending to Gemini...");
      display.clearDisplay();
      display.setCursor(0, 0);
      display.println("Thinking...");
      display.display();

      // Send the audio to Gemini and get the response
      String response = sendToGemini(audioBase64);
      
      // Display the response on the OLED
      displayOnOLED(response);
    } else {
      displayOnOLED("Recording failed.");
    }
    
    // Simple debounce delay
    delay(1000); 
  }
}

// ==============================
// == HELPER FUNCTIONS ==
// ==============================

/**
 * @brief Records audio from the I2S microphone until the button is released.
 * @return A Base64 encoded string of the recorded audio (with a simple WAV header).
 */
String recordAudio() {
  const int record_time = 5; // Max record time in seconds
  const int sample_rate = 16000;
  const int header_size = 44;
  int total_samples = sample_rate * record_time;
  int raw_audio_size = total_samples * sizeof(int32_t);
  
  // Buffer to hold raw audio data
  uint8_t* raw_audio = (uint8_t*)malloc(raw_audio_size);
  if (!raw_audio) {
    Serial.println("Failed to allocate memory for audio buffer");
    return "";
  }

  size_t bytes_read = 0;
  size_t total_bytes_read = 0;
  
  // Record audio while button is pressed
  while (digitalRead(BUTTON_PIN) == LOW && total_bytes_read < raw_audio_size) {
    esp_err_t result = i2s_read(I2S_NUM_0, raw_audio + total_bytes_read, raw_audio_size - total_bytes_read, &bytes_read, portMAX_DELAY);
    if (result == ESP_OK) {
      total_bytes_read += bytes_read;
    } else {
      Serial.printf("I2S read error: %d\n", result);
      break;
    }
  }

  // Convert 32-bit samples to 16-bit
  int16_t* pcm_audio = (int16_t*)malloc(total_bytes_read / 2);
  if (!pcm_audio) {
    free(raw_audio);
    return "";
  }
  for (int i = 0; i < total_bytes_read / 4; i++) {
    pcm_audio[i] = (int16_t)(((int32_t*)raw_audio)[i] >> 16);
  }
  
  // Create a simple WAV header
  uint8_t wav_header[header_size];
  int pcm_data_size = total_bytes_read / 2;
  int file_size = header_size + pcm_data_size - 8;

  memcpy(wav_header, "RIFF", 4);
  memcpy(wav_header + 4, &file_size, 4);
  memcpy(wav_header + 8, "WAVE", 4);
  memcpy(wav_header + 12, "fmt ", 4);
  int fmt_chunk_size = 16;
  memcpy(wav_header + 16, &fmt_chunk_size, 4);
  int16_t audio_format = 1; // PCM
  memcpy(wav_header + 20, &audio_format, 2);
  int16_t num_channels = 1; // Mono
  memcpy(wav_header + 22, &num_channels, 2);
  memcpy(wav_header + 24, &sample_rate, 4);
  int byte_rate = sample_rate * 2; // 16-bit mono
  memcpy(wav_header + 28, &byte_rate, 4);
  int16_t block_align = 2;
  memcpy(wav_header + 32, &block_align, 2);
  int16_t bits_per_sample = 16;
  memcpy(wav_header + 34, &bits_per_sample, 2);
  memcpy(wav_header + 36, "data", 4);
  memcpy(wav_header + 40, &pcm_data_size, 4);

  // Combine header and PCM data
  int total_wav_size = header_size + pcm_data_size;
  uint8_t* wav_buffer = (uint8_t*)malloc(total_wav_size);
  if (!wav_buffer) {
    free(raw_audio);
    free(pcm_audio);
    return "";
  }
  memcpy(wav_buffer, wav_header, header_size);
  memcpy(wav_buffer + header_size, pcm_audio, pcm_data_size);

  // Encode to Base64
  String encoded_audio = base64::encode(wav_buffer, total_wav_size);

  // Clean up memory
  free(raw_audio);
  free(pcm_audio);
  free(wav_buffer);

  Serial.printf("Recorded %d bytes, encoded to %d characters\n", total_wav_size, encoded_audio.length());
  return encoded_audio;
}

/**
 * @brief Sends Base64 encoded audio to the Google Gemini API.
 * @param audioBase64 The audio data encoded as a Base64 string.
 * @return The text response from the AI, or an error message.
 */
String sendToGemini(String audioBase64) {
  HTTPClient http;
  String url = "https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-flash-latest:generateContent?key=" + String(apiKey);
  
  http.begin(url);
  http.addHeader("Content-Type", "application/json");

  // Create JSON payload
  DynamicJsonDocument doc(4096); // Adjust size if needed
  JsonObject content = doc["contents"].createNestedObject();
  JsonObject part = content["parts"].createNestedObject();
  part["inline_data"]["mime_type"] = "audio/wav";
  part["inline_data"]["data"] = audioBase64;

  String requestBody;
  serializeJson(doc, requestBody);

  Serial.println("Sending request to Gemini...");
  int httpResponseCode = http.POST(requestBody);

  String response = "";
  if (httpResponseCode > 0) {
    response = http.getString();
    Serial.print("HTTP Response code: ");
    Serial.println(httpResponseCode);
    // Serial.println(response); // Uncomment for full JSON response

    // Parse the JSON response
    DynamicJsonDocument responseDoc(2048);
    DeserializationError error = deserializeJson(responseDoc, response);
    if (error) {
      Serial.print("deserializeJson() failed: ");
      Serial.println(error.c_str());
      return "Failed to parse AI response.";
    }
    
    // Extract the text
    if (responseDoc["candidates"][0]["content"]["parts"][0].containsKey("text")) {
      String aiText = responseDoc["candidates"][0]["content"]["parts"][0]["text"];
      return aiText;
    } else {
      return "AI did not return a text response.";
    }
  } else {
    Serial.print("Error on sending POST: ");
    Serial.println(httpResponseCode);
    return "API Error: " + String(httpResponseCode);
  }

  http.end();
  return "An unknown error occurred.";
}

/**
 * @brief Displays text on the OLED screen with simple word wrapping.
 * @param text The string to display.
 */
void displayOnOLED(String text) {
  display.clearDisplay();
  display.setCursor(0, 0);
  display.setTextSize(1);

  int16_t x = 0, y = 0;
  uint16_t w, h;
  int16_t line_height = 10; // Approximate height of a line

  String remainingText = text;
  while (remainingText.length() > 0) {
    int spaceIndex = remainingText.indexOf(' ');
    String word;
    if (spaceIndex == -1) {
      word = remainingText;
      remainingText = "";
    } else {
      word = remainingText.substring(0, spaceIndex + 1);
      remainingText = remainingText.substring(spaceIndex + 1);
    }

    display.getTextBounds(word, x, y, &x, &y, &w, &h);
    if (x + w > SCREEN_WIDTH) {
      x = 0;
      y += line_height;
      if (y > SCREEN_HEIGHT - line_height) {
        break; // Stop if we run out of screen space
      }
    }
    display.setCursor(x, y);
    display.print(word);
    x += w;
  }
  display.display();
}

Code Explaination

1.The setup() Function: The Pre-Flight Check

This function runs only once when you first power on the ESP32. It’s like a pre-flight check to get everything ready for action:

  • Starts the “Screen”: It wakes up the OLED display and shows a “Connecting…” message.
  • Starts the “Ears”: It configures and activates the I2S microphone, telling it to listen for sound.
  • Connects to the Internet: It uses your Wi-Fi credentials from the code to connect to your home network.
  • Gets Ready for the Button: It sets up the push button pin to listen for a press.

Once the setup() is done, your screen will show “WiFi Connected!” and “Press button to talk,” letting you know it’s ready for its main job.

2. The loop() Function: The Waiting Game

After the setup is complete, the code enters the loop() function, which runs over and over again, hundreds of times a second, To make it work cool we added a delay of 1 sec. Its job is very simple: it just waits for you to press the button.

It constantly checks the state of the button pin. As long as the button isn’t pressed, it does nothing. The moment you press the button, the loop triggers the main sequence of events.

3. The Core Actions: Listen, Think, and Speak

When you press the button, the loop() calls three helper functions in a specific order, which brings your assistant to life:

  1. recordAudio(): This function starts recording from the I2S microphone. It keeps capturing audio until you release the button. It then cleverly packages this audio into a format (a WAV file encoded as a Base64 string) that the Gemini API can understand.
  2. sendToGemini(): This function takes your recorded audio, wraps it in a digital package (a JSON object), and sends it over the internet to Google’s Gemini servers. It then waits patiently for a response. When Gemini sends back the text answer, this function unwraps it and hands it off.
  3. displayOnOLED(): This is the final step. This function takes the text answer from Gemini and prints it neatly on your OLED screen. It even handles long sentences by wrapping the text so it fits.

After the answer is displayed, the code goes back to the loop() function, where it starts waiting for you to press the button again.


Troubleshooting & Next Steps

No project is without its hiccups. Here are some common issues and how to solve them:

  • My ESP32 won’t connect to Wi-Fi: Double-check your ssid and password in the code. Make sure you’re within range of your router.
  • I get an API error (like 400 or 403): This usually means an issue with your API key. Ensure you’ve copied it correctly and haven’t exceeded your free quota. Also, make sure you’ve enabled the Gemini API in your Google Cloud Platform console.
  • The OLED is blank or shows garbage: Check your I2C wiring (SCL and SDA). Sometimes, these displays have a different I2C address (like 0x3C). You can run an I2C scanner sketch to find the correct address.
  • The response is garbled or empty: This could be a JSON parsing error or an issue with how the audio is being recorded. Check the Serial Monitor for any error messages from the API.

Ready to Level Up?

This project is just the beginning. Here are some ideas to take your AI assistant to the next level:

  • Add a Speaker: Implement text-to-speech to have the assistant speak its responses aloud.
  • Wake Word Detection: Instead of a button, use a more advanced library to trigger the assistant with a phrase like “Hey, Gemini!”
  • 3D-Printed Enclosure: Design and print a custom case to give your assistant a professional and unique look.
  • Add More Sensors: Connect a temperature sensor or a light sensor and ask your assistant questions like “What’s the temperature in the room?”

Conclusion

Congratulations! You’ve successfully built a functioning AI voice assistant using ESP32 from scratch. You’ve learned about hardware interfacing, I2S audio communication, working with APIs, and handling JSON data. This project is a perfect stepping stone into the vast and exciting world of IoT and AI-powered devices. So what will you ask your new assistant next? The possibilities are endless.

Add a Comment

Leave a Reply

Your email address will not be published. Required fields are marked *