Exodia is an AI Agent that can answer general knowledge information, recommend movies and sing songs.
There are 3 Nodes that work for the whole AI agent:
- Exodia Master Node
- Movies Worker Node
- Music Worker Node
Model: openai/whisper-large-v2
We access OpenAI's Whisper ASR which is hosted at the API endpoint provided by Huggingface: https://api-inference.huggingface.co/models/openai/whisper-large-v2.
This module is responsible for converting user voice input into text.
Model: coqui/XTTS-v2
We utilize the gradio_client library to query the app hosted on Huggingface Spaces via API: Client("tonyassi/voice-clone").
This module converts the generated text into speech by using a provided target audio file to produce a voice similar to the one in the file. It is important to note that this module currently supports only English text for reading aloud.
Model: meta-llama/Meta-Llama-3-8B-Instruct
A prompt-based approach is used to allow the large language model (LLM) to classify the user's query into one of the following categories: Music, Movie, or Others.
- If the query is related to Music or Movie, the query is passed to two worker nodes that specialize in handling these topics.
- For queries outside of Music and Movie, the Llama3 model directly processes and responds to the user.
We use Gradio to build the front-end interface for user interactions. We allow users to record their voice using a microphone, which is then converted into text and used for interacting with the chatbot. The chatbot supports conversation with a certain amount of text, taking the chat history into account. The text generated by the chatbot will be displayed in the chat window and automatically converted into speech. If there is a request for singing, the provided audio will be used to generate a music clip where AI covers the singing in the AI agent's voice.
This worker node is a movie recommender system that provides a chat like interface and brings in factual information from IMDB movies dataset . The description of the movies are embedded using all-MiniLM-L6-v2 model which acts as index for our Pinecone vector database. The user queries are embedded using the same embedding model and movies which are most similar to the user query are retrieved. These retrieved documents augment the Mistral-7B finetuned model to generate movie recommendations. This node uses a vector RAG.
We have used the following music samples for our AI singing feature:
| Song | Source | Octave | Description |
|---|---|---|---|
| The Lion King - Hakuna Matata | Shibby7 SoundCloud | 0 | Wikipedia description of song |
| Justin Bieber - Sorry | Sickstrophedj SoundCloud | 9 | Wikipedia description of song |
| Justin Bieber - Cold water | KoffeeBrothers SoundCloud | 9 | Wikipedia description of song |
| Justin Bieber - Love Yourself | Michel Waldhof SoundCloud | 9 | Wikipedia description of song |
| Eminem - Without Me | Gabe and Isotek booty SoundCloud | 5 | Wikipedia description of song |
| Eminem - Real Slim Shady | Wikipedia Audio | 5 | Wikipedia description of song |
| Avicii - Wake Me Up | Grafitte SoundCloud | 8 | Wikipedia description of song |
All of these values except The Source are stored in Azure Cosmos DB which also stores the vector embedding of descriptions. The source are stored in an object store. Vector similarity is calculated between the user query and all the songs in the cosmos db. The most relevant song is retrieved from the object store and passed to RVC Transformer.
The octave value is used to control the pitch of RVC. Since AI voice is female, songs with male voices need to be converted to higher pitched voice. For songs with female voices like Hakuna Matata, octave is kept as 0.
Music worker takes in user query and returns the audio file of the voice AI sung song.
- Exodia Master Node: https://huggingface.co/spaces/KleinPenny/Exodia
- Movies Worker Node: https://huggingface.co/spaces/ironserengety/movies-recommender
- Music Worker Node: https://huggingface.co/spaces/ironserengety/MusicRetriever
The authors and contributors of this project do not endorse or encourage any misuse or unethical use of this software. Any use of this software for purposes other than those intended is solely at the user's own risk. The authors and contributors shall not be held responsible for any damages or liabilities arising from the use of this demo inappropriately. SoundCloud sources were made freely available by the respective artists who covered the songs. For Wikipedia audio, we believe this is fair-use as our project is used for non-commercial purposes and licensed under a non-commercial license. Anyway, we would be happy to cooperative in case the music sample usage is found not be fair-use.
