Skip to content

Voice Control Module (Microphone Icon) #568

@ngoiyaeric

Description

@ngoiyaeric

Voice Input Integration in ChatPanel

1. Current State: No Voice Infrastructure Exists

A grep search across all .tsx files for every relevant audio/voice keyword (SpeechRecognition, webkitSpeechRecognition, MediaRecorder, AudioContext, whisper, deepgram, transcri) returns zero matches. There is no microphone, voice, or audio feature of any kind in the codebase today.


2. The Exact Integration Point Inside ChatPanel

The voice button belongs between the <Textarea> closing tag and the submit <Button> inside the relative flex items-start wrapper div. This is the absolute-positioned button zone on the right side of the input row.

Currently the layout has two absolute-positioned elements stacked on the right:

  • The ArrowRight submit button sits at right-2 (mobile) / right-2 (desktop). 1

The microphone button should be inserted before that submit button at a position like right-9 or right-10, and the <Textarea>'s right padding (pr-12) would need to increase to pr-20 to accommodate two right-side icons. 2

For mobile, the pattern used by attachment and submit is that those actions are delegated to MobileIconsBar via the ChatPanelRef imperative handle. A startVoiceInput method should similarly be added to the ChatPanelRef interface and exposed via useImperativeHandle, and a Mic button added to MobileIconsBar. 3 4


3. The setInput / debouncedGetSuggestions Pipeline to Hook Into

The transcribed text must call these two in sequence — exactly the same pattern used in the <Textarea>'s onChange handler: 5

debouncedGetSuggestions requires ≥ 2 words and a 500ms debounce window, then calls the getSuggestions server action with the value and current mapData, streaming back PartialRelated suggestions via onSuggestionsChange: 6

getSuggestions itself takes (query: string, mapData: MapData) and uses streamObject against the configured LLM: 7


4. Available Dependencies to Use

A. lucide-react — Already Installed, Has Mic / MicOff Icons

lucide-react ^0.507.0 is already a dependency. The existing icon imports in chat-panel.tsx demonstrate the pattern: 8

Simply add Mic and MicOff to that import line — no new package needed for the UI.

B. Web Speech API — Browser-native, Zero New Dependencies

The Web Speech API (window.SpeechRecognition / window.webkitSpeechRecognition) requires no npm packages. The new state refs fit exactly alongside the existing debounceTimeoutRef, inputRef, formRef, and fileInputRef refs already declared: 9

Add isRecording: boolean state alongside existing state vars: 10

The recognition.onresult callback feeds the transcript string directly into setInput(transcript) and debouncedGetSuggestions(transcript), then inputRef.current?.focus() (same pattern as the existing focus effect): 11

Limitations: No iOS Safari support, accuracy varies, requires HTTPS.

C. OpenAI Whisper — Highest Accuracy, OPENAI_API_KEY Already Configured

For a server-side transcription path using OpenAI Whisper, the OPENAI_API_KEY is already referenced by getModel(): 12 13

The MediaRecorderBlobFormData → API route pattern is already established in compressImage/header-search-button.tsx: 14 15

A new route app/api/transcribe/route.ts (parallel to the existing routes in app/api/) would receive the audio Blob, forward it to OpenAI's POST /v1/audio/transcriptions (whisper-1), and return the transcript text. The existing API route convention is: 16

What to add: The openai npm package (bun add openai) — note that @ai-sdk/openai ^1.3.24 is the Vercel AI SDK adapter and does not expose the Whisper transcription endpoint: 17


5. Mobile Imperative Handle Wiring

On mobile, the ChatPanelRef exposes methods that chat.tsx calls imperatively. A startVoiceInput method should be added to ChatPanelRef and useImperativeHandle, then MobileIconsBar extended with an onVoiceClick prop alongside the existing onAttachmentClick / onSubmitClick: 18 19 20


Architecture Summary

flowchart TD
    A["Mic Button (chat-panel.tsx, right of Textarea)"]
    B["Web Speech API\n(window.SpeechRecognition)\nZero deps, browser-only"]
    C["MediaRecorder Blob\n(browser API)"]
    D["app/api/transcribe/route.ts\n(new — needs openai npm pkg)"]
    E["OpenAI Whisper\nwhisper-1 model"]
    F["transcript string"]
    G["setInput(transcript)\n(chat-panel.tsx line 249)"]
    H["debouncedGetSuggestions(transcript)\n(chat-panel.tsx line 250)"]
    I["getSuggestions(value, mapData)\n(lib/actions/suggest.ts)"]
    J["onSuggestionsChange → SuggestionsDropdown"]

    A --> B
    A --> C
    B --> F
    C --> D --> E --> F
    F --> G
    F --> H
    H --> I --> J
Loading

Notes

  • @ai-sdk/openai (^1.3.24) is the Vercel AI SDK provider adapter — it does not wrap the Whisper audio transcription endpoint. You would need to bun add openai to get the full OpenAI Node.js SDK, or use a raw fetch call to https://api.openai.com/v1/audio/transcriptions with the existing OPENAI_API_KEY.
  • The Web Speech API approach is the fastest to integrate with zero new dependencies — just add Mic/MicOff from the already-installed lucide-react, add isRecording state, and a useCallback handler that calls setInput + debouncedGetSuggestions in its onresult callback.
  • The breakpoint difference between ChatPanel (<= 1024px) and Chat (< 768px) for isMobile means the microphone button's desktop/mobile rendering logic should follow ChatPanel's own isMobile state, not the parent's. 21 22

Citations

File: components/chat-panel.tsx (L9-9)

import { ArrowRight, Plus, Paperclip, X, Sprout } from 'lucide-react'

File: components/chat-panel.tsx (L25-54)

export interface ChatPanelRef {
  handleAttachmentClick: () => void
  submitForm: () => void
}

export const ChatPanel = forwardRef<ChatPanelRef, ChatPanelProps>(({ messages, input, setInput, onSuggestionsChange }, ref) => {
  const [, setMessages] = useUIState<typeof AI>()
  const { submit, clearChat } = useActions()
  const { mapProvider } = useSettingsStore()
  const [isMobile, setIsMobile] = useState(false)
  const [selectedFile, setSelectedFile] = useState<File | null>(null)
  const [suggestions, setSuggestionsState] = useState<PartialRelated | null>(null)
  const setSuggestions = useCallback((s: PartialRelated | null) => {
    setSuggestionsState(s)
    onSuggestionsChange?.(s)
  }, [onSuggestionsChange, setSuggestionsState])
  const { mapData } = useMapData()
  const debounceTimeoutRef = useRef<NodeJS.Timeout | null>(null)
  const inputRef = useRef<HTMLTextAreaElement>(null)
  const formRef = useRef<HTMLFormElement>(null)
  const fileInputRef = useRef<HTMLInputElement>(null)

  useImperativeHandle(ref, () => ({
    handleAttachmentClick() {
      fileInputRef.current?.click()
    },
    submitForm() {
      formRef.current?.requestSubmit()
    }
  }));

File: components/chat-panel.tsx (L57-64)

  useEffect(() => {
    const checkMobile = () => {
      setIsMobile(window.innerWidth <= 1024)
    }
    checkMobile()
    window.addEventListener('resize', checkMobile)
    return () => window.removeEventListener('resize', checkMobile)
  }, [])

File: components/chat-panel.tsx (L134-158)

  const debouncedGetSuggestions = useCallback(
    (value: string) => {
      if (debounceTimeoutRef.current) {
        clearTimeout(debounceTimeoutRef.current)
      }

      const wordCount = value.trim().split(/\s+/).filter(Boolean).length
      if (wordCount < 2) {
        setSuggestions(null)
        return
      }

      debounceTimeoutRef.current = setTimeout(async () => {
        const suggestionsStream = await getSuggestions(value, mapData)
        for await (const partialSuggestions of readStreamableValue(
          suggestionsStream
        )) {
          if (partialSuggestions) {
            setSuggestions(partialSuggestions as PartialRelated)
          }
        }
      }, 500) // 500ms debounce delay
    },
    [mapData, setSuggestions]
  )

File: components/chat-panel.tsx (L160-162)

  useEffect(() => {
    inputRef.current?.focus()
  }, [])

File: components/chat-panel.tsx (L232-247)

          <Textarea
            ref={inputRef}
            name="input"
            rows={1}
            maxRows={isMobile ? 3 : 5}
            tabIndex={0}
            placeholder="Explore"
            spellCheck={false}
            value={input}
            data-testid="chat-input"
            className={cn(
              'resize-none w-full min-h-12 rounded-fill border border-input pl-14 pr-12 pt-3 pb-1 text-sm ring-offset-background file:border-0 file:bg-transparent file:text-sm file:font-medium placeholder:text-muted-foreground focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-ring focus-visible:ring-offset-2 disabled:cursor-not-allowed disabled:opacity-50',
              isMobile
                ? 'mobile-chat-input input bg-background'
                : 'bg-muted'
            )}

File: components/chat-panel.tsx (L248-251)

            onChange={e => {
              setInput(e.target.value)
              debouncedGetSuggestions(e.target.value)
            }}

File: components/chat-panel.tsx (L276-289)

          <Button
            type="submit"
            size={'icon'}
            variant={'ghost'}
            className={cn(
              'absolute top-1/2 transform -translate-y-1/2',
              isMobile ? 'right-1' : 'right-2'
            )}
            disabled={input.length === 0 && !selectedFile}
            aria-label="Send message"
            data-testid="chat-submit"
          >
            <ArrowRight size={isMobile ? 18 : 20} />
          </Button>

File: components/mobile-icons-bar.tsx (L23-65)

interface MobileIconsBarProps {
  onAttachmentClick: () => void;
  onSubmitClick: () => void;
}

export const MobileIconsBar: React.FC<MobileIconsBarProps> = ({ onAttachmentClick, onSubmitClick }) => {
  const [, setMessages] = useUIState<typeof AI>()
  const { clearChat } = useActions()
  const { toggleCalendar } = useCalendarToggle()

  const handleNewChat = async () => {
    setMessages([])
    await clearChat()
  }

  return (
    <div className="mobile-icons-bar-content">
      <Button variant="ghost" size="icon" onClick={handleNewChat} data-testid="mobile-new-chat-button">
        <Plus className="h-[1.2rem] w-[1.2rem]" />
      </Button>
      <ProfileToggle />
      <MapToggle />
      <Button variant="ghost" size="icon" onClick={toggleCalendar} title="Open Calendar" data-testid="mobile-calendar-button">
        <CalendarDays className="h-[1.2rem] w-[1.2rem] transition-all rotate-0 scale-100" />
      </Button>
      <Button variant="ghost" size="icon" data-testid="mobile-search-button">
        <Search className="h-[1.2rem] w-[1.2rem] transition-all rotate-0 scale-100" />
      </Button>
      <a href="https://buy.stripe.com/14A3cv7K72TR3go14Nasg02" target="_blank" rel="noopener noreferrer">
        <Button variant="ghost" size="icon">
          <TentTree className="h-[1.2rem] w-[1.2rem] transition-all rotate-0 scale-100" />
        </Button>
      </a>
      <Button variant="ghost" size="icon" onClick={onAttachmentClick} data-testid="mobile-attachment-button">
        <Paperclip className="h-[1.2rem] w-[1.2rem] transition-all rotate-0 scale-100" />
      </Button>
      <Button variant="ghost" size="icon" data-testid="mobile-submit-button" onClick={onSubmitClick}>
        <ArrowRight className="h-[1.2rem] w-[1.2rem] transition-all rotate-0 scale-100" />
      </Button>
      <History location="header" />
      <ModeToggle />
    </div>
  )

File: lib/actions/suggest.ts (L9-45)

export async function getSuggestions(
  query: string,
  mapData: MapData
) {
  const objectStream = createStreamableValue<PartialRelated>()

  const systemPrompt = `As a helpful assistant, your task is to generate a set of three query suggestions based on the user's partial input. The user is currently interacting with a map, and the following data represents the current map view: ${JSON.stringify(mapData)}. Use this location context to provide relevant suggestions.

  For instance, if the user's partial query is "best coffee near" and the map context is centered on San Francisco, your output should follow this format:

  "{
    "items": [
      { "query": "best coffee near downtown San Francisco" },
      { "query": "top-rated independent coffee shops in SF" },
      { "query": "coffee shops with outdoor seating in San Francisco" }
    ]
  }"

  Generate three queries that anticipate the user's needs, offering logical next steps for their search. The suggestions should be concise and directly related to the partial query and map context.`

  ;(async () => {
    const result = await streamObject({
      model: (await getModel()) as LanguageModel,
      system: systemPrompt,
      messages: [{ role: 'user', content: query }],
      schema: relatedSchema
    })

    for await (const obj of result.partialObjectStream) {
      if (obj && typeof obj === 'object' && 'items' in obj) {
        objectStream.update(obj as PartialRelated)
      }
    }
    objectStream.done()
  })()

  return objectStream.value

File: lib/utils/index.ts (L24-30)

export async function getModel(requireVision: boolean = false) {
  const selectedModel = await getSelectedModel();

  const xaiApiKey = process.env.XAI_API_KEY;
  const gemini3ProApiKey = process.env.GEMINI_3_PRO_API_KEY;
  const awsAccessKeyId = process.env.AWS_ACCESS_KEY_ID;
  const awsSecretAccessKey = process.env.AWS_SECRET_ACCESS_KEY;

File: lib/utils/index.ts (L121-124)

  const openai = createOpenAI({
    apiKey: openaiApiKey,
  });
  return openai('gpt-4o');

File: components/header-search-button.tsx (L60-74)

      let mapboxBlob: Blob | null = null;
      let googleBlob: Blob | null = null;

      if (mapProvider === 'mapbox' && map) {
        // Capture Mapbox
        const canvas = map.getCanvas()
        const rawMapboxBlob = await new Promise<Blob | null>(resolve => {
          canvas.toBlob(resolve, 'image/png')
        })
        if (rawMapboxBlob) {
          mapboxBlob = await compressImage(rawMapboxBlob).catch(e => {
            console.error('Failed to compress Mapbox image:', e);
            return rawMapboxBlob;
          });
        }

File: components/header-search-button.tsx (L123-141)

      const formData = new FormData()
      if (mapboxBlob) formData.append('file_mapbox', mapboxBlob, 'mapbox_capture.png')
      if (googleBlob) formData.append('file_google', googleBlob, 'google_capture.png')

      // Keep 'file' for backward compatibility if needed, or just use the first available
      formData.append('file', (mapboxBlob || googleBlob)!, 'map_capture.png')

      formData.append('action', 'resolution_search')
      formData.append('timezone', mapData.currentTimezone || 'UTC')
      formData.append('drawnFeatures', JSON.stringify(mapData.drawnFeatures || []))

      const center = mapProvider === 'mapbox' && map ? map.getCenter() : mapData.cameraState?.center;
      if (center) {
        formData.append('latitude', center.lat.toString())
        formData.append('longitude', center.lng.toString())
      }

      const responseMessage = await actions.submit(formData)
      setMessages((currentMessages: any[]) => [...currentMessages, responseMessage as any])

File: app/api/chat/route.ts (L1-15)

import { NextResponse, NextRequest } from 'next/server';
import { saveChat, createMessage, NewChat, NewMessage } from '@/lib/actions/chat-db';
import { getCurrentUserIdOnServer } from '@/lib/auth/get-current-user';
// import { generateUUID } from '@/lib/utils'; // Assuming generateUUID is in lib/utils as per PR context - not needed for PKs

// This is a simplified POST handler. PR #533's version might be more complex,
// potentially handling streaming AI responses and then saving.
// For now, this focuses on the database interaction part.
export async function POST(request: NextRequest) {
  try {
    const userId = await getCurrentUserIdOnServer();
    if (!userId) {
      return NextResponse.json({ error: 'Unauthorized' }, { status: 401 });
    }

File: package.json (L19-22)

    "@ai-sdk/amazon-bedrock": "^1.1.6",
    "@ai-sdk/anthropic": "^1.2.12",
    "@ai-sdk/google": "^1.2.22",
    "@ai-sdk/openai": "^1.3.24",

File: components/chat.tsx (L42-50)

  const chatPanelRef = useRef<ChatPanelRef>(null);

  const handleAttachment = () => {
    chatPanelRef.current?.handleAttachmentClick();
  };

  const handleMobileSubmit = () => {
    chatPanelRef.current?.submitForm();
  };

File: components/chat.tsx (L56-70)

  useEffect(() => {
    // Check if device is mobile
    const checkMobile = () => {
      setIsMobile(window.innerWidth < 768)
    }
    
    // Initial check
    checkMobile()
    
    // Add event listener for window resize
    window.addEventListener('resize', checkMobile)
    
    // Cleanup
    return () => window.removeEventListener('resize', checkMobile)
  }, [])

File: components/chat.tsx (L134-145)

        <div className="mobile-icons-bar">
          <MobileIconsBar onAttachmentClick={handleAttachment} onSubmitClick={handleMobileSubmit} />
        </div>
        <div className="mobile-chat-input-area">
          <ChatPanel 
            ref={chatPanelRef} 
            messages={messages} 
            input={input} 
            setInput={setInput}
            onSuggestionsChange={setSuggestions}
          />
        </div>

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions