beyondwords

Beyond Words - predicting user decision with text data

Executive Summary

Key Procedures

  1. Preprocessing text data for machine to read
    • Converte emoji and emoticon by emoji and emot packages, respectively.
    • Note: although emot can also process emoji, its emoji database is not as comprehensive as emoji.
  2. Choosing the right natural language processing (NLP)models
    • Test unsupervised NLP: TextBlob and VADER
    • Test supervised NLP: off-the-shelf pretrained BERT (state-of-the-art)
    • Highly skewed data: user text contents were overwhelmingly positive and supportive, unsuitable for existing unsupervised models or off-the-shelf supervised models.
  3. Tuning BERT model with proper labelling
    • Create two type of labels for each text: Tone (positive/neutral/negative) and Content (rich/partial/none)
    • Fine-tune two BERT models through ktrain for each label class separately
    • Achieved accuracy score 0.85 and 0.78 for Tone and Content, respectively
    • Note: another approach is to merge two label classes into one (2x3) to train one model (less costly but weakned prediction: accuracy score 0.67 due to data imbalance)
  4. Predicting user churn and bounce
    • Only use text data generated before user decisions
    • Extract text features, including number of word, character, and text of differnt time periods for each user
    • Combine text features and sentiment features (60 features)
    • Applied classificiation models and a stacking ensemble (combined KNN, RF and XGB by Logistic Regression)
    • Achieved 0.89 and 0.76 accuracy for churn and bounce, respectively.
  5. Takeaways
    • Strong correlation between text and sentiment features
      • text meta features are good enough to predict user decision (easy to scale up for big data)
    • User engagement level is a key indicator of user decision
      • model can predict user churn 4 weeks before user decision
      • premium users have a lifetime 3-4 months
    • With more data
      • real-time prediction and evaluation by sliding window approach

Presentation: YouTube and slides

Examples

Click to show to an example of Emoji and Emoticon Conversion

Click to show the Sanity Check of sentiment analysis by different NLP models

NLP Models Performance Comparision, OTS: off-the-shelf

Last update 2020/11/05

Created 2020/10/08

Current Page
Return to My GitHub

>>>>>> CC BY 4.0 <<<<<<