Skip to content

shyshin/Exodia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

19 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Exodia

Exodia is an AI Agent that can answer general knowledge information, recommend movies and sing songs.

There are 3 Nodes that work for the whole AI agent:

  1. Exodia Master Node
  2. Movies Worker Node
  3. Music Worker Node

Architecture diagram: architecture

Exodia Master Node

Automatic Speech Recognition (ASR) Module

Model: openai/whisper-large-v2
We access OpenAI's Whisper ASR which is hosted at the API endpoint provided by Huggingface: https://api-inference.huggingface.co/models/openai/whisper-large-v2.
This module is responsible for converting user voice input into text.

Text to Speech (TTS) Module

Model: coqui/XTTS-v2
We utilize the gradio_client library to query the app hosted on Huggingface Spaces via API: Client("tonyassi/voice-clone").
This module converts the generated text into speech by using a provided target audio file to produce a voice similar to the one in the file. It is important to note that this module currently supports only English text for reading aloud.

Classification Module

Model: meta-llama/Meta-Llama-3-8B-Instruct
A prompt-based approach is used to allow the large language model (LLM) to classify the user's query into one of the following categories: Music, Movie, or Others.

  • If the query is related to Music or Movie, the query is passed to two worker nodes that specialize in handling these topics.
  • For queries outside of Music and Movie, the Llama3 model directly processes and responds to the user.

Front-End

We use Gradio to build the front-end interface for user interactions. We allow users to record their voice using a microphone, which is then converted into text and used for interacting with the chatbot. The chatbot supports conversation with a certain amount of text, taking the chat history into account. The text generated by the chatbot will be displayed in the chat window and automatically converted into speech. If there is a request for singing, the provided audio will be used to generate a music clip where AI covers the singing in the AI agent's voice.

Movies Worker Node

This worker node is a movie recommender system that provides a chat like interface and brings in factual information from IMDB movies dataset . The description of the movies are embedded using all-MiniLM-L6-v2 model which acts as index for our Pinecone vector database. The user queries are embedded using the same embedding model and movies which are most similar to the user query are retrieved. These retrieved documents augment the Mistral-7B finetuned model to generate movie recommendations. This node uses a vector RAG.

Music Worker Node

We have used the following music samples for our AI singing feature:

Song Source Octave Description
The Lion King - Hakuna Matata Shibby7 SoundCloud 0 Wikipedia description of song
Justin Bieber - Sorry Sickstrophedj SoundCloud 9 Wikipedia description of song
Justin Bieber - Cold water KoffeeBrothers SoundCloud 9 Wikipedia description of song
Justin Bieber - Love Yourself Michel Waldhof SoundCloud 9 Wikipedia description of song
Eminem - Without Me Gabe and Isotek booty SoundCloud 5 Wikipedia description of song
Eminem - Real Slim Shady Wikipedia Audio 5 Wikipedia description of song
Avicii - Wake Me Up Grafitte SoundCloud 8 Wikipedia description of song

All of these values except The Source are stored in Azure Cosmos DB which also stores the vector embedding of descriptions. The source are stored in an object store. Vector similarity is calculated between the user query and all the songs in the cosmos db. The most relevant song is retrieved from the object store and passed to RVC Transformer. The octave value is used to control the pitch of RVC. Since AI voice is female, songs with male voices need to be converted to higher pitched voice. For songs with female voices like Hakuna Matata, octave is kept as 0.

Music worker takes in user query and returns the audio file of the voice AI sung song.

HuggingFace space link

  1. Exodia Master Node: https://huggingface.co/spaces/KleinPenny/Exodia
  2. Movies Worker Node: https://huggingface.co/spaces/ironserengety/movies-recommender
  3. Music Worker Node: https://huggingface.co/spaces/ironserengety/MusicRetriever

Disclaimer

The authors and contributors of this project do not endorse or encourage any misuse or unethical use of this software. Any use of this software for purposes other than those intended is solely at the user's own risk. The authors and contributors shall not be held responsible for any damages or liabilities arising from the use of this demo inappropriately. SoundCloud sources were made freely available by the respective artists who covered the songs. For Wikipedia audio, we believe this is fair-use as our project is used for non-commercial purposes and licensed under a non-commercial license. Anyway, we would be happy to cooperative in case the music sample usage is found not be fair-use.

About

Audio AI agent that can sing maybe!

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages