Face Detection Using ESP32-CAM and Python on Thony Python IDE

Sep 21, 20246 min read

Face detection has become a fundamental aspect of various AI applications, from security systems to personal devices. With the ESP32-CAM, a low-cost microcontroller with camera capabilities, you can create your own face detection system. This guide will show you how to perform face detection using ESP32-CAM and Python on the Thony Python IDE. Whether you're a hobbyist or a tech enthusiast, this tutorial will help you create a functional project that detects faces in real-time.

Prerequisites:

ESP32-Cam module
FTDI programmer
Arduino IDE (installed)
Thony Python IDE (installed)
Micro-USB cable
Jumper wires
A local Wi-Fi network

Step 1: Setup ESP32-CAM with Thony IDE

1.1 Install Thony Python IDE

- Download Thony: Visit [Thony.org](https://thonny.org) and download the IDE for your operating system.

- Install Python (If not already installed): Thony IDE will install Python automatically, but if you want a separate installation, go to [Python.org](https://python.org).

1.2 Connect ESP32-CAM to Your System

Connect the ESP32-Cam to the FTDI programmer
Connect the U0T and U0R pins of the ESP32-Cam to the RX and TX pins of the FTDI programmer.
Connect the GND and 5V pins of the ESP32-Cam to the respective FTDI pins.
Make sure the IO0 pin is connected to GND for flashing the ESP32-Cam.

Install the ESP32 board package in Arduino IDE

Open Arduino IDE and go to File > Preferences. In the "Additional Board Manager URLs" field, paste the following link:

https://dl.espressif.com/dl/package_esp32_index.json

Go to Tools > Board > Board Manager and search for ESP32. Install the ESP32 board package.
Select the ESP32-Cam board in Arduino IDE
Go to Tools > Board and choose AI-Thinker ESP32-Cam.
Set the upload speed to 115200 and the correct port for your FTDI programmer.
Upload the Webserver Example Code for Face Detection
Open File > Examples > ESP32 > Camera > CameraWebServer.
In the code, ensure you add your Wi-Fi SSID and password to connect the ESP32-Cam to your network.
Upload the code to the ESP32-Cam by pressing Upload in the Arduino IDE. Once uploaded, remove the GND connection from IO0 and reset the module.

Step 2: Get the ESP32-Cam’s IP Address

Open Serial Monitor

Go to Tools > Serial Monitor in Arduino IDE. Set the baud rate to 115200.
Once the ESP32-Cam boots, you should see an IP address displayed in the Serial Monitor. Copy this IP address, as it will be used in the next step.

Step 3: Integrate Python for Face Detection

Install the OpenCV, Numpy library in Thony
- Open Thony Python IDE and go to Tools > Manage Packages.
- Search for opencv-python, Numpy and install it. This library will handle face detection.
Install requests library
- In the same way, search for and install the requests library. This is required to interact with the ESP32-Cam’s webserver.

3. Write Python Script for Face Detection

Create a New Python Script
- In Thony, create a new file and name it something like face_detection.py.
Write the Code
- Use the following code to capture the video stream from the ESP32-Cam and detect faces.

Run the Python Script

- Make sure the ESP32-CAM webserver is running. Replace `'http://your-esp32-cam-ip-address/stream'` with the actual IP address of your ESP32-CAM.

- Run the Python script in Thony IDE. A window will pop up displaying the video stream from the ESP32-CAM with detected faces highlighted.

Step 4: Code Explanation

Let's break down this code for face and eye detection using the ESP32-CAM stream into simple sections for a beginner:

1. Importing Required Libraries

import cv2

import urllib.request

import numpy as np

- cv2: This is the OpenCV library, used for image and video processing.

- urllib.request: This is used to fetch data from URLs (in this case, we’ll fetch images from the ESP32-CAM).

- numpy (`np`): This is used for handling arrays and matrices. We need it to convert the images we get from the URL into a format OpenCV can process.

2. Loading Pre-Trained Models (Haar Cascades) for Face and Eye Detection

f_cas = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_frontalface_default.xml')

eye_cascade = cv2.CascadeClassifier(cv2.data.haarcascades + 'haarcascade_eye.xml')

- Cascade Classifier: OpenCV uses pre-trained models (Haar cascades) to detect objects like faces and eyes.

- `haarcascade_frontalface_default.xml` is used for detecting faces.

- `haarcascade_eye.xml` is used for detecting eyes.

The `CascadeClassifier` function loads these XML files, which contain the trained models.

3. Defining the ESP32-CAM URL

url = 'http://192.168.1.104/capture'

- This defines the URL from where the ESP32-CAM streams its video or captures frames. You should replace `'http://192.168.1.104/capture'` with the actual IP address of your ESP32-CAM. Make sure the ESP32-CAM is connected to the same network as your computer.

4. Creating a Display Window

cv2.namedWindow("Live Transmission", cv2.WINDOW_AUTOSIZE)

- This creates a window named "Live Transmission" to display the camera feed. `cv2.WINDOW_AUTOSIZE` means the window will automatically adjust its size based on the image size.

5. Main Loop to Continuously Capture and Process Frames

while True:

img_resp = urllib.request.urlopen(url)

imgnp = np.array(bytearray(img_resp.read()), dtype=np.uint8)

img = cv2.imdecode(imgnp, -1)

- `while True:`: This loop continuously fetches frames from the ESP32-CAM.

- `urllib.request.urlopen(url)`: This retrieves the image from the ESP32-CAM via the URL.

- `np.array(bytearray(img_resp.read()), dtype=np.uint8)`: Converts the image from bytes into a NumPy array so it can be handled by OpenCV.

- `cv2.imdecode(imgnp, -1)`: Decodes the NumPy array into an image that OpenCV can work with.

6. Converting the Image to Grayscale

gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

- `cv2.cvtColor` converts the color image (BGR format) into grayscale, which is easier and faster for the detection algorithms (face and eye detection) to process.

7. Detecting Faces in the Image

face = f_cas.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)

- `f_cas.detectMultiScale`: This function detects faces in the grayscale image.

- `gray`: The grayscale image where faces are to be detected.

- `scaleFactor=1.1`: This parameter specifies how much the image size is reduced at each image scale (controls accuracy).

- `minNeighbors=5`: Defines the minimum number of neighboring rectangles that need to be detected for an object (face) to be considered valid.

8. Drawing Rectangles Around Detected Faces

for x, y, w, h in face:

cv2.rectangle(img, (x, y), (x + w, y + h), (0, 0, 255), 3)

- `for x, y, w, h in face:`: This loop runs through all the detected faces, where:

- `x` and `y` are the coordinates of the upper-left corner of the face.

- `w` is the width and `h` is the height of the face.

- `cv2.rectangle`: Draws a red rectangle (BGR color `(0, 0, 255)`) around the detected face in the original image (`img`).

9. Detecting and Highlighting Eyes Within the Detected Face

roi_gray = gray[y:y+h, x:x+w]

roi_color = img[y:y+h, x:x+w]

eyes = eye_cascade.detectMultiScale(roi_gray)

for (ex, ey, ew, eh) in eyes:

cv2.rectangle(roi_color, (ex, ey), (ex + ew, ey + eh), (0, 255, 0), 2)

- `roi_gray` and `roi_color`: These define the "Region of Interest" (ROI) where eyes are expected to be found, which is the region inside the detected face.

- `eye_cascade.detectMultiScale(roi_gray)`: Detects eyes within the face region in the grayscale image.

- `cv2.rectangle`: Draws a green rectangle (BGR color `(0, 255, 0)`) around each detected eye.

10. Displaying the Result

cv2.imshow("live transmission", img)

- `cv2.imshow`: This function displays the current frame with rectangles around detected faces and eyes in the "Live Transmission" window.

11. Exiting the Program

key = cv2.waitKey(5)

if key == ord('q'):

break

- `cv2.waitKey(5)`: Waits for 5 milliseconds for a key press.

- `if key == ord('q'):`: If the 'q' key is pressed, the program breaks out of the loop and stops the live video feed.

12. Cleanup

cv2.destroyAllWindows()

- `cv2.destroyAllWindows`: Closes the window displaying the video when the loop ends (after pressing 'q').

Summary:

- Import libraries: OpenCV for image processing, `urllib` for getting images from the ESP32-CAM, and NumPy for array handling.

- Haar Cascades: Pre-trained models to detect faces and eyes.

- ESP32-CAM URL: Defines the web address from which the camera feed is fetched.

- Face & Eye Detection: OpenCV processes each frame, converting it to grayscale for more efficient detection, and uses `CascadeClassifier` to draw rectangles around faces and eyes.

- Live Video Stream: Displays the video feed in real time, with face and eye detection applied, until the user presses 'q' to quit.

Conclusion:

Congratulations! You’ve successfully set up face detection using the ESP32-CAM and Python on the Thony IDE. This project can be extended for various applications such as smart home security, automated attendance systems, or even facial recognition.If you enjoyed this tutorial, be sure to visit our Skill-Hub for the Arduino Master Class, where you can take your tech skills to the next level!