Skip to content

Problem invoking function / tool when using Google_GenerativeAI.Live #100

@edossantos-sipcaller

Description

@edossantos-sipcaller

Hello,

I'm not sure if my issue is related to how I'm using this library, or in Gemini itself, so please forgive me if this is misplaced.

I'm trying to build an AI agent which interacts through voice (audio stream) with a customer over a phone call. The binding with the phone call is working fine, and I can have a conversation with the AI agent, so far so good. I need to get an outcome from the AI agent when the call ends. For example, if the customer agrees to receive an email, if the customer wants to get connected to a human, etc. I need that the AI agent invokes a function named set_call_outcome passing as parameter the value send_email, transfer_call or drop_call. I'm asking this through the system instructions. The problem is that instead of invoking the function, I can "hear" the "invocation" in the generated audio. So the AI agents says the following:
OK, You will receive an email from us. Thanks.11 set_call_outcome('send_email')

That's part of the transcription as well.

Also, we can see that the AI agent has the intention of calling the function, because we receive this message:

Message received: BidiResponsePayload { SetupComplete: null, ServerContent: BidiGenerateContentServerContent { TurnComplete: null, Interrupted: null, GroundingMetadata: null, ModelTurn: Content { Parts: [Part { Text: "**Terminating the Validation Attempt**

I've determined that the customer wants to receive the email. Consequently, I've ended the session as instructed. The `set_call_outcome` function will be invoked to reflect this outcome of \"send_email\".


", InlineData: null, FunctionCall: null, FunctionResponse: null, FileData: null, ExecutableCode: null, CodeExecutionResult: null, VideoMetadata: null, Thought: True, ThoughtSignature: null }], Role: null }, GenerationComplete: null, InputTranscription: null, OutputTranscription: null, UrlContextMetadata: null, TurnCompleteReason: null, WaitingForInput: null }, ToolCall: null, ToolCallCancellation: null, GoAway: null, SessionResumptionUpdate: null, UsageMetadata: null }

I'm not sure if this is related to how we're using this library, or an issue in the underlying Gemini AI logic. In case it helps, here's a snippet of how I'm configuring the client and adding the function to it:

        var setCallOutcomeFunc = (
            (string outcome) =>
            {
                _logger.Verbose("Setting call outcome to {Outcome}", outcome);
                OnCallOutcomeAvailable?.Invoke(outcome);
                return "Call outcome set";
            }
        );

        _setCallOutcomeQT = new QuickTool(
            setCallOutcomeFunc,
            "set_call_outcome",
            "Set the call outcome after having a conversation"
        );

        _config = new()
        {
            ResponseModalities = [Modality.AUDIO],
            SpeechConfig = new SpeechConfig
            {
                LanguageCode = language,
                VoiceConfig = new() { PrebuiltVoiceConfig = new() { VoiceName = voice } },
            },
        };
        _client = new(
            platformAdapter: new GoogleAIPlatformAdapter(googleApiKey),
            modelName: model,
            config: _config
        )
        {
            UseGoogleSearch = false,
            UseCodeExecutor = true,
            InputAudioTranscriptionEnabled = true,
            OutputAudioTranscriptionEnabled = true,
            FunctionTools = [_setCallOutcomeQT],
            ToolConfig = new()
            {
                RetrievalConfig = new() { LanguageCode = language },
                FunctionCallingConfig = new()
                {
                    AllowedFunctionNames = ["set_call_outcome"],
                    Mode = FunctionCallingMode.ANY,
                },
            },
        };

        await _client.ConnectAsync(false, ct);
        await _client.SendSetupAsync(
            new BidiGenerateContentSetup()
            {
                Model = _model.ToModelId(),
                GenerationConfig = _config,
                OutputAudioTranscription = new AudioTranscriptionConfig(),
                InputAudioTranscription = new AudioTranscriptionConfig(),
                SystemInstruction = new Content(_instructions, null),
            },
            ct
        );

Any ideas? Am I doing something wrong when configuring the client?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions