Skip to content

67 Clock Bot uses YOLOv8 and MiDaS to detect people and estimate depth from a single RGB frame, then controls an ESP32 robot through a real-time finite-state machine. Includes retreat, wall-avoidance, and corner-escape behaviors with modular code for detection, depth, control, and telemetry.

Notifications You must be signed in to change notification settings

cogniera/67ClockBot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

67 Clock Bot

67 Clock Bot is an autonomous computer-vision robot that reacts to people and obstacles in real time using only a single camera, an ESP32, and an ultrasonic sensor.
It combines YOLOv8, MiDaS depth-from-RGB, and a custom finite-state machine to make consistent decisions under changing conditions.

The entire system was designed, built, and tested in 3 days, covering computer vision, embedded systems, real-time control, and robotics behavior.


Installation

    1. Install Python 3.10+
    1. Install required Python packages pip install ultralytics pip install opencv-python pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121 pip install numpy
    1. YOLOv8 model

The YOLOv8 model will automatically download yolov8n.pt on first run.

    1. MiDaS depth model

The MiDaS model (MiDaS_small) is automatically downloaded the first time depth.py runs.

    1. ESP32 Setup

Upload esp32.ino to your ESP32 board.

You’ll need:

ESP32 board support installed

Motor driver wiring (L298N or similar)

Ultrasonic sensor wired to trigger/echo pins

Running the bot

python main.py

Make sure to enter:

Your ESP32 IP

Your UDP command port

Your PC listening port

Your camera index or camera stream URL

inside main.py or utils.py.


Core Capabilities

Person Detection (YOLOv8)

The robot runs YOLOv8n at ~15 Hz and selects the closest person by tracking the largest bounding box.
This is used to determine:

  • whether a person is near
  • where the person is located horizontally
  • how the robot should turn while retreating

Depth Estimation from a Single RGB Frame (MiDaS)

MiDaS generates a dense depth map using only an RGB image, without stereo cameras.
Relative depth is used to determine whether a detected person is “close enough” to trigger retreat.

Ultrasonic Telemetry (ESP32 → PC)

An ESP32 sends { "range_cm": value } to the PC over UDP.
This realtime distance reading allows:

  • halting retreat when a wall is too close
  • entering SLIDE when backing into a wall
  • entering BOUNCE when escaping corners

Finite-State Machine (FSM)

Robot behavior is controlled by a predictable FSM:

  • IDLE – no person detected
  • RETREAT – person detected and close
  • SLIDE – wall detected during retreat
  • BOUNCE – small forward escape after sliding

This structure ensures stable and explainable reactions.


The PC controls behavior and vision; the ESP32 handles motor output and wall sensing.


Code Structure

The project is organized into clear, modular files:

67-clock-bot/ │ ├── main.py # Main event loop (vision → control → drive commands) ├── detector.py # YOLO person detector ├── depth.py # MiDaS depth-from-RGB estimation ├── controller.py # Finite-state machine + motion logic ├── telemetry.py # ESP32 UDP listener for wall range data └── utils.py # UDP senders + on-screen visualization

detector.py

Runs YOLOv8 and returns only the largest detected person, reducing noise and improving stability.

depth.py

Runs MiDaS Small and normalizes the depth map using percentile scaling, producing a stable [0,1] depth range.

controller.py

Implements the FSM and computes linear/angular velocity commands based on fused CV + telemetry data.

telemetry.py

Runs a background thread to receive ultrasonic data over UDP from the ESP32.

main.py

The real-time loop that ties everything together:

  1. Capture frame
  2. Detect person
  3. Estimate depth
  4. Read telemetry
  5. Update FSM
  6. Send motion commands

Key Technical Highlights

  • Real-time computer vision pipeline (YOLOv8 + MiDaS)
  • Depth estimation from RGB-only input
  • Multi-sensor fusion between vision and ultrasonic sensing
  • Finite-state machine for predictable behavior
  • Modular and extensible architecture
  • Reliable UDP communication at ~15 Hz
  • Fully functional robot built in 3 days

Future Extensions

  • Person-following mode
  • Local mapping or SLAM
  • Stereo depth or hardware depth camera
  • ROS2 ecosystem integration
  • On-device inference (Jetson, OAK-D)
  • Additional behaviors (path planning, patrolling)

System Architecture

About

67 Clock Bot uses YOLOv8 and MiDaS to detect people and estimate depth from a single RGB frame, then controls an ESP32 robot through a real-time finite-state machine. Includes retreat, wall-avoidance, and corner-escape behaviors with modular code for detection, depth, control, and telemetry.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published