The core module contains the Agent, STT, TTS, as well as dataclasses related to the Agent. The Agent is responsible for the overall control flow of the application. It is the main entry point for the user and the main interface to the other modules. It also takes care of the proactivity of the application.


graph LR;

start_agent --> greet_user;
greet_user --> check_for_proactivity;
check_for_proactivity --> trigger_proactivity;
trigger_proactivity --> get_user_input;
check_for_proactivity --> get_user_input;
get_user_input --> calculate_best_match;
calculate_best_match --> trigger_use_case;
trigger_use_case --> check_for_proactivity;


Class to handle speech to handle main functionality of the assistant

The core functionality of the assistant is to handle speech-to-text conversion (stt), text-to_speech (tts) conversion, calculate the best match for the parsed text, greet the user, trigger the right use case, and handle proactivity.


Name Type Description Default
get_mic bool, optional

Boolean if the speech to text class should first ask for the microphone to use. By default False.



Name Type Description
assistant_name str

The name of the assistant

quotes pd.DataFrame

DataFrame storing the use cases and functionality combinations

user User

User class to store the user information (eg., name, age)

stt SpeechToText

Speech to text class to handle speech-to-text conversion

tts TextToSpeech

Text to speech class to handle text-to-speech conversion

log_proactivity LogProactivity

Log proactivity class to handle the logging of proactivity

uc_general GeneralUseCase

General use case class to handle general use cases

uc_navigation NavigationUseCase

Navigation use case class to handle navigation use cases

uc_event EventUseCase

Event use case class to handle event use cases

uc_sport SportUseCase

Sport use case class to handle sport use cases


Checks if there are any updates which should be announced to the user

Checks every 60 seconds if there are any updates which should be announced to the user. There is an additional option to set a separate interval for each use case.

Proactivity IDs

The following table shows the IDs for the proactivity.

ID Use Case
1 Event
2 Morning Briefing
3 Sport
4 Navigation


Name Type Description Default
test_proactivity int | None, optional

A integer between 1 and 5 which triggers the proactivity for the corresponding use case.



Evaluates the parsed text to trigger the correct use case


Name Type Description Default
parsed_text str

The voice input of the user parsed to lower case string


_get_best_match(parsed_text, threshold=0.7)

Find the best match for the parsed text

Function calculates the similarity between the parsed text and the use cases.

  • TODO: Add tokenization and stop words
  • TODO: Watch if the default threshold is too high
self.quotes DataFrame

The self.quotes consists of three columns: use_case, choice and phrase. We use the use_case and choice column for the chain-of-responsibility pattern to map the best match to the final function. The phrase column contains multiple phrases which are going to be compared to the parsed text.

use_case choice phrase
0 morningBriefing newsSummary whats going on
1 morningBriefing newsSummary morning briefing
2 events eventSummary what is going on
3 navigation dhbw dhbw
4 navigation dhbw i need to get to the dhbw
5 navigation hpe i need to get to the hpe


Name Type Description Default
parsed_text str

The parsed text which should be matched to a use case.

threshold float, optional

The threshold which is used to determine if the similarity is high enough to be considered. The value needs to be between 0 and 1. By default 0.7.



Type Description

If the threshold is not between 0 and 1.


Type Description

Returns a object with the use case, the selected endpoint within the use case (choice), the similarity, and the parsed text.


Function to greet the user.

Depending on the time of the day, the assistant greets the user with a different greeting.


Main function to interact with the user

The agent function is the main function of the assistant. It first greets the user and then checks proactively if there are updates for the user. If thats not the case, it will start listening for user input in 60 second intervals. If the user input is not empty, it will execute the use case function for proactivity.

The threading library seems to be not compatible with some python version (documentation). Therefore it will be removed and the agent will be executed in a single thread.

  • TODO: Add hotword detection


Name Type Description Default
test_proactivity int | None, optional

A integer between 1 and 5 which triggers the proactivity for the corresponding use case.


User Interaction


Class to convert speech to text.

Initializes the speech to text class.

  • TODO: Think about a better way to handle the case that the microphone_index is not required


Name Type Description Default
get_mic bool

If the speech to text class should first get the microphone index.



Name Type Description
recognizer sr.Recognizer

The speech recognition object.

microphone_index int | None

The index of the microphone which should be used.


First gets the user input and then checks if the user said yes.

  • TODO: Fix that yes is not recognized well


Type Description

Boolean if the user said yes.


Converts an audio file to text.

  • TODO: For now this function is only for testing purposes


Name Type Description Default
audio_file str | Path

The path to the audio file which should be converted.



Type Description
str | None

The parsed text or None if the parsing failed.


First records an audio file an then pareses it to text.

When the function does not detect any speech for 60 seconds it will timeout and return None.

  • TODO: Maybe use adjust_for_ambient_noise
  • TODO: Add function to cancel the request without quitting the program


Name Type Description Default
line_above bool, optional

If a new line should be printed before the user input. By default False.



Type Description
str | None

The parsed text or None if no text could be parsed.


Class to convert text to speech.

Initializes the text to speech class.

  • TODO: Add Attributes section


Name Type Description
engine pyttsx3.Engine

The text to speech engine.

convert_text(text, optimize_time=True, optimize_numbers=True, line_above=False)

Converts text to speech


Name Type Description Default
text str

The Text which should be converted to speech.

optimize_time bool, optional

If the time should be optimized for speech. Will replace 22:00 with 22 o'clock and 22:30 with 22 30. By default True.

optimize_numbers bool, optional

If the numbers should be optimized for speech. Will replace 22. with 22 .. By default True.

line_above bool, optional

If a new line should be printed before the bot input. By default False.


optimize_text(text, optimize_time, optimize_numbers)

Optimizes text with time indications (in HH:MM format) for speech.


Name Type Description Default
text str

The text which should be optimized.

optimize_time bool, optional

If the time should be optimized for speech. Will replace 22:00 with 22 o'clock and 22:30 with 22 30. By default True.

optimize_numbers bool, optional

If the numbers should be optimized for speech. Will replace 22. with 22 .. By default True.



Type Description

The optimized text.


Address dataclass

Dataclass to store the address of a user.


Name Type Description
street str

The street of the user.

city str

The city of the user.

zip_code int

The zip code of the user.

country str

The country of the user.

vvs_id str

The VVS ID of the user.

BestMatch dataclass

Dataclass to store the best match for a given user input.


Name Type Description
use_case str

The name of the use case.

function_key str

The key of the function which should be called.

similarity float

The similarity between the user input and the best match.

parsed_text str

The parsed text from the user input.

Favorites dataclass

Dataclass to store the favorites of a user.

For example the favorite stocks, sports teams, etc.


Name Type Description Default
stocks list[str]

The favorite stocks of the user.

league str

The favorite league of the user.

team str

The favorite team of the user.

news_country str

The country the user wants to receive news from.

news_keywords list[str]

The favorite news keywords of the user.

wakeup_time datetime

The wakeup time of the user.


LogProactivity dataclass

A class to keep track of the last time proactivity was triggered


Name Type Description
last_check datetime

The last time proactivity was triggered

last_event_check datetime

The last time the event use case was triggered

last_morning_briefing_check datetime

The last time the morning briefing was triggered

last_wakeup_check datetime

The last time the wakeup in morning briefing was triggered

last_sport_check datetime

The last time the sport use case was triggered

last_navigation_check datetime

The last time the navigation use case was triggered

Possessions dataclass

Dataclass to store the possessions of a user.


Name Type Description
bike bool

Does the user own a bike?

car bool

Does the user own a car?

User dataclass

Dataclass supposed to store the user data


Name Type Description
name str

The name of the user.

age int

The age of the user.

address Address

The address of the user.

possessions Possessions

The possessions of the user.

favorites Favorites