Exodia

Exodia is an AI Agent that can answer general knowledge information, recommend movies and sing songs.

There are 3 Nodes that work for the whole AI agent:

Exodia Master Node
Movies Worker Node
Music Worker Node

Architecture diagram:

Exodia Master Node

Automatic Speech Recognition (ASR) Module

Model: openai/whisper-large-v2
We access OpenAI's Whisper ASR which is hosted at the API endpoint provided by Huggingface: https://api-inference.huggingface.co/models/openai/whisper-large-v2.
This module is responsible for converting user voice input into text.

Text to Speech (TTS) Module

Model: coqui/XTTS-v2
We utilize the gradio_client library to query the app hosted on Huggingface Spaces via API: Client("tonyassi/voice-clone").
This module converts the generated text into speech by using a provided target audio file to produce a voice similar to the one in the file. It is important to note that this module currently supports only English text for reading aloud.

Classification Module

Model: meta-llama/Meta-Llama-3-8B-Instruct
A prompt-based approach is used to allow the large language model (LLM) to classify the user's query into one of the following categories: Music, Movie, or Others.

If the query is related to Music or Movie, the query is passed to two worker nodes that specialize in handling these topics.
For queries outside of Music and Movie, the Llama3 model directly processes and responds to the user.

Front-End

We use Gradio to build the front-end interface for user interactions. We allow users to record their voice using a microphone, which is then converted into text and used for interacting with the chatbot. The chatbot supports conversation with a certain amount of text, taking the chat history into account. The text generated by the chatbot will be displayed in the chat window and automatically converted into speech. If there is a request for singing, the provided audio will be used to generate a music clip where AI covers the singing in the AI agent's voice.

Movies Worker Node

This worker node is a movie recommender system that provides a chat like interface and brings in factual information from IMDB movies dataset . The description of the movies are embedded using all-MiniLM-L6-v2 model which acts as index for our Pinecone vector database. The user queries are embedded using the same embedding model and movies which are most similar to the user query are retrieved. These retrieved documents augment the Mistral-7B finetuned model to generate movie recommendations. This node uses a vector RAG.

Music Worker Node

We have used the following music samples for our AI singing feature:

Song	Source	Octave	Description
The Lion King - Hakuna Matata	Shibby7 SoundCloud	0	Wikipedia description of song
Justin Bieber - Sorry	Sickstrophedj SoundCloud	9	Wikipedia description of song
Justin Bieber - Cold water	KoffeeBrothers SoundCloud	9	Wikipedia description of song
Justin Bieber - Love Yourself	Michel Waldhof SoundCloud	9	Wikipedia description of song
Eminem - Without Me	Gabe and Isotek booty SoundCloud	5	Wikipedia description of song
Eminem - Real Slim Shady	Wikipedia Audio	5	Wikipedia description of song
Avicii - Wake Me Up	Grafitte SoundCloud	8	Wikipedia description of song

All of these values except The Source are stored in Azure Cosmos DB which also stores the vector embedding of descriptions. The source are stored in an object store. Vector similarity is calculated between the user query and all the songs in the cosmos db. The most relevant song is retrieved from the object store and passed to RVC Transformer. The octave value is used to control the pitch of RVC. Since AI voice is female, songs with male voices need to be converted to higher pitched voice. For songs with female voices like Hakuna Matata, octave is kept as 0.

Music worker takes in user query and returns the audio file of the voice AI sung song.

HuggingFace space link

Exodia Master Node: https://huggingface.co/spaces/KleinPenny/Exodia
Movies Worker Node: https://huggingface.co/spaces/ironserengety/movies-recommender
Music Worker Node: https://huggingface.co/spaces/ironserengety/MusicRetriever

Disclaimer

The authors and contributors of this project do not endorse or encourage any misuse or unethical use of this software. Any use of this software for purposes other than those intended is solely at the user's own risk. The authors and contributors shall not be held responsible for any damages or liabilities arising from the use of this demo inappropriately. SoundCloud sources were made freely available by the respective artists who covered the songs. For Wikipedia audio, we believe this is fair-use as our project is used for non-commercial purposes and licensed under a non-commercial license. Anyway, we would be happy to cooperative in case the music sample usage is found not be fair-use.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ExodiaMaster		ExodiaMaster
MoviesWorker		MoviesWorker
MusicWorker		MusicWorker
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Exodia

Exodia Master Node

Automatic Speech Recognition (ASR) Module

Text to Speech (TTS) Module

Classification Module

Front-End

Movies Worker Node

Music Worker Node

HuggingFace space link

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Exodia

Exodia Master Node

Automatic Speech Recognition (ASR) Module

Text to Speech (TTS) Module

Classification Module

Front-End

Movies Worker Node

Music Worker Node

HuggingFace space link

Disclaimer

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages