Core
The core module contains the Agent, STT, TTS, as well as dataclasses related to the Agent. The Agent is responsible for the overall control flow of the application. It is the main entry point for the user and the main interface to the other modules. It also takes care of the proactivity of the application.
Agent
graph LR;
start_agent --> greet_user;
greet_user --> check_for_proactivity;
check_for_proactivity --> trigger_proactivity;
trigger_proactivity --> get_user_input;
check_for_proactivity --> get_user_input;
get_user_input --> calculate_best_match;
calculate_best_match --> trigger_use_case;
trigger_use_case --> check_for_proactivity;
Agent(get_mic=False)
Class to handle speech to handle main functionality of the assistant
The core functionality of the assistant is to handle speech-to-text conversion (stt), text-to_speech (tts) conversion, calculate the best match for the parsed text, greet the user, trigger the right use case, and handle proactivity.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
get_mic |
bool, optional
|
Boolean if the speech to text class should first ask for the microphone to use. By default |
False
|
Attributes:
Name | Type | Description |
---|---|---|
assistant_name |
str
|
The name of the assistant |
quotes |
pd.DataFrame
|
DataFrame storing the use cases and functionality combinations |
user |
User
|
User class to store the user information (eg., name, age) |
stt |
SpeechToText
|
Speech to text class to handle speech-to-text conversion |
tts |
TextToSpeech
|
Text to speech class to handle text-to-speech conversion |
log_proactivity |
LogProactivity
|
Log proactivity class to handle the logging of proactivity |
uc_general |
GeneralUseCase
|
General use case class to handle general use cases |
uc_navigation |
NavigationUseCase
|
Navigation use case class to handle navigation use cases |
uc_event |
EventUseCase
|
Event use case class to handle event use cases |
uc_sport |
SportUseCase
|
Sport use case class to handle sport use cases |
_check_proactivity(test_proactivity=None)
Checks if there are any updates which should be announced to the user
Checks every 60
seconds if there are any updates which should be announced to the user.
There is an additional option to set a separate interval for each use case.
Proactivity IDs
The following table shows the IDs for the proactivity.
ID | Use Case |
---|---|
1 | Event |
2 | Morning Briefing |
3 | Sport |
4 | Navigation |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
test_proactivity |
int | None, optional
|
A integer between |
None
|
_evaluate_use_case(parsed_text)
Evaluates the parsed text to trigger the correct use case
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parsed_text |
str
|
The voice input of the user parsed to lower case string |
required |
_get_best_match(parsed_text, threshold=0.7)
Find the best match for the parsed text
Function calculates the similarity between the parsed text and the use cases.
- TODO: Add tokenization and stop words
- TODO: Watch if the default threshold is too high
self.quotes
DataFrame
The self.quotes
consists of three columns: use_case
, choice
and phrase
.
We use the use_case
and choice
column for the chain-of-responsibility pattern
to map the best match to the final function. The phrase
column contains multiple phrases
which are going to be compared to the parsed text.
use_case | choice | phrase | |
---|---|---|---|
0 | morningBriefing | newsSummary | whats going on |
1 | morningBriefing | newsSummary | morning briefing |
2 | events | eventSummary | what is going on |
3 | navigation | dhbw | dhbw |
4 | navigation | dhbw | i need to get to the dhbw |
5 | navigation | hpe | i need to get to the hpe |
Parameters:
Name | Type | Description | Default |
---|---|---|---|
parsed_text |
str
|
The parsed text which should be matched to a use case. |
required |
threshold |
float, optional
|
The threshold which is used to determine if the similarity is high enough to be considered.
The value needs to be between 0 and 1. By default |
0.7
|
Raises:
Type | Description |
---|---|
ValueError
|
If the threshold is not between 0 and 1. |
Returns:
Type | Description |
---|---|
BestMatch
|
Returns a object with the use case, the selected endpoint within the use case (choice), the similarity, and the parsed text. |
_greeting()
Function to greet the user.
Depending on the time of the day, the assistant greets the user with a different greeting.
main(test_proactivity=None)
Main function to interact with the user
The agent function is the main function of the assistant. It first greets the user and
then checks proactively if there are updates for the user. If thats not the case, it will start
listening for user input in 60
second intervals. If the user input is not empty, it will
execute the use case function for proactivity.
The threading
library seems to be not compatible with some python version
(documentation). Therefore it will be removed
and the agent will be executed in a single thread.
- TODO: Add hotword detection
Parameters:
Name | Type | Description | Default |
---|---|---|---|
test_proactivity |
int | None, optional
|
A integer between |
None
|
User Interaction
SpeechToText(get_mic)
Class to convert speech to text.
Initializes the speech to text class.
- TODO: Think about a better way to handle the case that the
microphone_index
is not required
Parameters:
Name | Type | Description | Default |
---|---|---|---|
get_mic |
bool
|
If the speech to text class should first get the microphone index. |
required |
Attributes:
Name | Type | Description |
---|---|---|
recognizer |
sr.Recognizer
|
The speech recognition object. |
microphone_index |
int | None
|
The index of the microphone which should be used. |
check_if_yes()
First gets the user input and then checks if the user said yes.
- TODO: Fix that yes is not recognized well
Returns:
Type | Description |
---|---|
bool
|
Boolean if the user said yes. |
convert_audio_file(audio_file)
convert_speech(line_above=False)
First records an audio file an then pareses it to text.
When the function does not detect any speech for 60
seconds it will timeout and return None
.
- TODO: Maybe use
adjust_for_ambient_noise
- TODO: Add function to cancel the request without quitting the program
Parameters:
Name | Type | Description | Default |
---|---|---|---|
line_above |
bool, optional
|
If a new line should be printed before the user input. By default |
False
|
Returns:
Type | Description |
---|---|
str | None
|
The parsed text or None if no text could be parsed. |
TextToSpeech()
Class to convert text to speech.
Initializes the text to speech class.
- TODO: Add Attributes section
Attributes:
Name | Type | Description |
---|---|---|
engine |
pyttsx3.Engine
|
The text to speech engine. |
convert_text(text, optimize_time=True, optimize_numbers=True, line_above=False)
Converts text to speech
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
The Text which should be converted to speech. |
required |
optimize_time |
bool, optional
|
If the time should be optimized for speech. Will replace |
True
|
optimize_numbers |
bool, optional
|
If the numbers should be optimized for speech. Will replace |
True
|
line_above |
bool, optional
|
If a new line should be printed before the bot input. By default |
False
|
optimize_text(text, optimize_time, optimize_numbers)
Optimizes text with time indications (in HH:MM format) for speech.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
The text which should be optimized. |
required |
optimize_time |
bool, optional
|
If the time should be optimized for speech. Will replace |
required |
optimize_numbers |
bool, optional
|
If the numbers should be optimized for speech. Will replace |
required |
Returns:
Type | Description |
---|---|
str
|
The optimized text. |
Dataclasses
Address
dataclass
BestMatch
dataclass
Dataclass to store the best match for a given user input.
Attributes:
Name | Type | Description |
---|---|---|
use_case |
str
|
The name of the use case. |
function_key |
str
|
The key of the function which should be called. |
similarity |
float
|
The similarity between the user input and the best match. |
parsed_text |
str
|
The parsed text from the user input. |
Favorites
dataclass
Dataclass to store the favorites of a user.
For example the favorite stocks, sports teams, etc.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
stocks |
list[str]
|
The favorite stocks of the user. |
required |
league |
str
|
The favorite league of the user. |
required |
team |
str
|
The favorite team of the user. |
required |
news_country |
str
|
The country the user wants to receive news from. |
required |
news_keywords |
list[str]
|
The favorite news keywords of the user. |
required |
wakeup_time |
datetime
|
The wakeup time of the user. |
required |
LogProactivity
dataclass
A class to keep track of the last time proactivity was triggered
Attributes:
Name | Type | Description |
---|---|---|
last_check |
datetime
|
The last time proactivity was triggered |
last_event_check |
datetime
|
The last time the event use case was triggered |
last_morning_briefing_check |
datetime
|
The last time the morning briefing was triggered |
last_wakeup_check |
datetime
|
The last time the wakeup in morning briefing was triggered |
last_sport_check |
datetime
|
The last time the sport use case was triggered |
last_navigation_check |
datetime
|
The last time the navigation use case was triggered |
Possessions
dataclass
User
dataclass
Dataclass supposed to store the user data
Attributes:
Name | Type | Description |
---|---|---|
name |
str
|
The name of the user. |
age |
int
|
The age of the user. |
address |
Address
|
The address of the user. |
possessions |
Possessions
|
The possessions of the user. |
favorites |
Favorites
|