Personality Detection using NLP-Deep Learning, CNN model
- Tey Jia Ying
- Dec 21, 2021
- 2 min read
This is a personality detection model I did during my internship at SenticNet company. I used Deep Learning, particularly Convolutional Neural Network to predict personalities from text

Basis of personality Analysis -Myers Briggs Type Indicator
I would be basing my analysis on the 16 big personalities from the Myers–Briggs Type Indicator (MBTI), describe the preferences of an individual in four dimensions and these basic dimensions combine into one of 16 different personality types. The four dimensions include Extroversion–Introversion (E–I), Sensation–Intuition (S–N), Thinking–Feeling (T–F), and Judgment–Perception (J–P). There are two types of personalities in each dimension. The figure below shows a key to the eight Myers-Briggs Type Indicator personality types.

Using the Myers–Briggs Type Indicator®, preferences of an individual are categorized into four dimensions, and these categories represent 16 different types of personality based on the Myers–Briggs Type Indicator®. This table shows the 16 personality types individuals can develop based on the interaction between their preferences.

Data Collection and Preprocessing
For the training of models, I decided to retrieve the dataset that was publicly available on Kaggle and widely used for personality prediction https://www.kaggle.com/datasnaek/mbti-type . In this dataset, there were 2 columns, namely the MBTI personality type and the fifty obtained from the individual social media.
I decided to perform a data cleaning on the dataset as there were too many posts with NaNs, pings, URLs, numbers, emojis, special characters, punctuations and stop words.
Next, I performed on-hot encoding on the MBTI type column and transformed each dimension into a 0 or 1.

Next, I splitted it into training and testing data. After that, I converted the text in the x_train and x_test to matrices using tensorflow keras’s Tokenizer function. This class allows to vectorize a text corpus, by turning each text into either a sequence of integers (each integer being the index of a token in a dictionary) or into a vector where the coefficient for each token could be binary. I then performed .texts_to_matrix and .pad_sequences to transform a list (of length num_samples) of sequences (lists of integers) into a 2D Numpy array of shape (num_samples, num_timesteps). num_timesteps is either the maxlen argument if provided, or the length of the longest sequence in the list.

Building CNN Model
After ensuring that the data was compatible for my model, I set up the hyperparameters. The max_len was 1000 as the vectorized form of my x_train data was length of 1000. Next I formed the convolutional neural network model.



Analysis and Visualization
I then applied the CNN model.


As shown in the figure above, a word cloud was created to identify the frequency of words said by each personality. The top 10 words said by each personality were also extracted for more analysis.
For personalities that gave comments with more positive polarity, most of them used words like “new”,”thank”,”come”. However, for personalities that gave comments with more negative polarity, most of them used words like "want", "stop”, ”need”, ”know”.
From this, we can interpret that personality types that contain “FP” are more grateful towards the government and are more willing to abide by the rules and measures and personalities that contain “TJ” generally prioritize self-concepts and needs first. Hence, they may be judgmental on what the policies government introduced.
View on Github:https://github.com/jiayingtey/Personality-Detection-NLP-Deep-Learning-CNN-

Comments