Skip to content

feat: long press voice button to pick audio file from device#5

Open
cookie223 wants to merge 1 commit intomemex-lab:mainfrom
cookie223:feat/long-press-audio-picker
Open

feat: long press voice button to pick audio file from device#5
cookie223 wants to merge 1 commit intomemex-lab:mainfrom
cookie223:feat/long-press-audio-picker

Conversation

@cookie223
Copy link
Copy Markdown

Summary

  • Adds long press gesture on the mic/voice button in the input sheet
  • Opens a file picker to select an existing audio file from device storage
  • Supported formats: m4a, mp3, wav, ogg
  • Adds i18n strings for English and Chinese (error messages and audio label)
  • Shows a snackbar if user tries to pick while recording is active

How it works

Gesture Action
Tap Start/stop recording (existing behavior)
Long press Open file picker to select an audio file

Testing

  • Tested on real Android device (Samsung SM F966U1, Android 16)
  • Please test on iOS — not verified by the author; the file_picker package supports iOS but behavior on iOS has not been confirmed

Notes

This uses the existing file_picker dependency. No new packages were added.

🤖 Generated with Claude Code

Adds long press gesture on the mic/voice button in the input sheet to
open a file picker and select an existing audio file (m4a, mp3, wav, ogg)
instead of recording a new one.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@tigerlaibao
Copy link
Copy Markdown
Collaborator

Thanks a lot for your contribution to the project!

I’m currently working on a feature to support a local speech-to-text (STT) model, specifically SenseVoiceSmall. The reason for this is that most LLMs don't natively support direct audio input. By using a standalone local model, we can handle transcription without requiring additional API keys.

This feature might have some code conflicts with your PR. To help me test if the local model can handle your use case effectively, could you let me know the typical size of the audio files you usually upload?

Really appreciate you helping out with the development!

@cookie223
Copy link
Copy Markdown
Author

For my own use case it is usually voice memo/voice recorder recorded audio. They won't be very large, at most a few MB.
Would it be feasible to make the model for audio input selectable in the setting? Like the media processing model? I would much prefer to use Gemini which is natively multimodal and the free api key is enough for daily use.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants