Converting yourself into a chatbot and let AI version of you talk to your friends on whatsapp.
I came across an episode of silicon valley where “gilfoyle” creates a chatbot to talk to “Dinesh“, after that moment I realized what if I can do the same with my friends, can they differentiate between me and the chatbot, so to answer this question I have to build the bot. Finally, I built one that can talk to my friends and family members on whatsapp, the model is built using sequence to sequence model. Steps to create a whatsapp chatbot is given below.
Step 1: Extract whatsapp file
Extraction of whatsapp chat can be done by simply clicking on three dots on the top.
After clicking on three dots select “more” and then select “export chat” option and mail yourself the chat, remember to export chat without media.
I’m using chat history of multiple people but keeping in mind the texting style. A person’s texting style is different with different person but with close friends it remains the same, So for the project, I’ve used chat history of my close friends, you can choose your chat dataset accordingly.
Step 2: Create Dataset
The chat history that is extracted is itself a dataset. Usually the chat history with a particular person is not large enough to give good result, so for this project I’ve exported chats of close friends but of-course by taking their permission (Take permission before you use their chat to train your model :P).
The Dataset needs to be preprocessed before training
4/5/20, 04:15 — me: 😭
4/5/20, 04:15 — Friend: hey
4/5/20, 04:15 — Friend: whatsup?
4/5/20, 04:15 — Friend: feeling good
4/5/20, 04:15 — Friend: ?
4/5/20, 04:15 — me: yes
4/5/20, 04:15 — me: what about you
4/5/20, 04:15 — me:<Media omitted>
The above text represents the whatsapp text file but names have been removed for confidentiality purpose, when you will import it you will get your name and your friends name instead of “me” and “friend”.
1. The first step in preprocessing is to remove the unwanted text.
The above code imports the necessary packages for the preprocessing stage. Emojis are removed from text and all the portion which had media file such as image, sticker or audio file before are now replaced with “<Media omitted>”, this text needs to be removed to have a clear conversation. After performing the cleaning, the text file generated is “chat file nameout.txt”
“chat file nameout.txt”
me:
Friend: hey
Friend: whatsup?
Friend: feeling good
Friend: ?
me: yes
me: what about you
me:
2. Separate the chats according to the user
It is necessary to separate chats of you and your friend, as one of the chat will have to act as a question and another one will have to act as an answer to that question.
It creates two file “chat file nameCW.txt” and “chat file namesep.txt”
“chatfile namesep.txt”
me:
|
me: yes
me: what about you
me:
“chat file nameCW.txt”
|
Friend: hey
Friend: whatsup?
Friend: feeling good
Friend: ?
|
3. Combining continuous chats
The text which are continuous that is one after another are part of a single sentence eg: In the above file “chat file name CW.txt”, texts “hey”, “whatsup?”, “feeling good” and “?” are part of a single sentence which is “hey whatsup feeling good ?”.
The above code generates file “chat file nameok.txt” and “chat file nameCWOK.txt”.
“chat file nameCWOK.txt”.
|
hey Whatsup feeling good ?
|
“chat file nameok.txt”
|
me: yes what about you
|
It can be seen from the above text that the file makes question and answer pair “chat file nameCWOK.txt” is the question and the “chat file nameok.txt” is the answer.
4. Removing the “|” symbol from both the file
Now the symbol “|” must be removed from both the files to make it a proper line to line question answer pair.
The two files are created “myside.txt” and “otherside .txt”
“otherside.txt”
hey Whatsup feeling good ?
“myfile.txt”
yes what about you
5. Combining “myside.txt” and “otherside.txt”
To create question and answer pair, both the files are combined into one file “data.txt” two lines represent question and answer, first line represent the question and second line represent answer and third represents question and so on.
“data.txt”
hey Whatsup feeling good ?
yes what about you
Step 3: Tokenize the data set and prepare data for training
The data set needs to be tokenized before feeding it to sequence to sequence model, these tokens are the known words for our application and application can only understand these tokens. Dataset is splitted into training set(70%), test set(15%) and validation set(15%).
“data.py”
The above code creates 3 file “idx_a.npy” , “idx_q” and “meta.pkl” , these 3 file acts as an input to the sequence to sequence model.
Step 4: Train Model
The model is now trained, the value for number of epochs is set as 100, you can change it according to your requirements.
“main.py”
After training “model.npz” file is created, this file stores our model.
Step 5: Test the model and connect it to whatsapp
The model is connected to whatsapp, here the username of whom the chat bot needs to chat is written, the username is the name of user as stored in your whatsapp contact.
Step 6: Finally! Lets chat :P
I used this chat bot on my friend and the reply given by chat bot was quite satisfying, it is able to create sentence and has copied my texting style but it currently just looks into a sentence and reply accordingly, it has no idea about the context of the chat, for the future development the chatbot can be trained to learn the context and reply accordingly.
Link to download code is given below.