self-reported-user-engagement

Self-Reported User Engagement for an Online Forum

There are a myriad of ways to analyze and understand website usage and forum participation:

  • click-through-rate
  • counting backlinks
  • PageRank, and so on.

Not least of which is asking the end-useres themselves to fill out a questionnaire. Such a questionnaire is called as a survey and it provides insights into not only how a person uses a forum but also why.

In [210]:
pip install opendatasets
Requirement already satisfied: opendatasets in /usr/local/lib/python3.6/dist-packages (0.0.109)
Requirement already satisfied: tqdm in /usr/local/lib/python3.6/dist-packages (from opendatasets) (4.41.1)
In [211]:
import opendatasets as od
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
matplotlib.rcParams['font.size'] = 14
matplotlib.rcParams['figure.figsize'] = (9, 5)
matplotlib.rcParams['figure.facecolor'] = '#00000000'
%matplotlib inline
import seaborn as sns
import numpy as np

Getting our dataset

We'll be using the StackOverflow developer survey dataset for our analysis. This is survey that is conducted annually and we'll deal with the latest 2020 one.

With the opendatasets helper library the files will be downloaded.

In [212]:
od.download('stackoverflow-developer-survey-2020')
Using downloaded and verified file: ./stackoverflow-developer-survey-2020/survey_results_public.csv
Using downloaded and verified file: ./stackoverflow-developer-survey-2020/survey_results_schema.csv
Using downloaded and verified file: ./stackoverflow-developer-survey-2020/README.txt
In [213]:
pd.read_csv('stackoverflow-developer-survey-2020/survey_results_public.csv').head()
Out[213]:
Respondent MainBranch Hobbyist Age Age1stCode CompFreq CompTotal ConvertedComp Country CurrencyDesc CurrencySymbol DatabaseDesireNextYear DatabaseWorkedWith DevType EdLevel Employment Ethnicity Gender JobFactors JobSat JobSeek LanguageDesireNextYear LanguageWorkedWith MiscTechDesireNextYear MiscTechWorkedWith NEWCollabToolsDesireNextYear NEWCollabToolsWorkedWith NEWDevOps NEWDevOpsImpt NEWEdImpt NEWJobHunt NEWJobHuntResearch NEWLearn NEWOffTopic NEWOnboardGood NEWOtherComms NEWOvertime NEWPurchaseResearch NEWPurpleLink NEWSOSites NEWStuck OpSys OrgSize PlatformDesireNextYear PlatformWorkedWith PurchaseWhat Sexuality SOAccount SOComm SOPartFreq SOVisitFreq SurveyEase SurveyLength Trans UndergradMajor WebframeDesireNextYear WebframeWorkedWith WelcomeChange WorkWeekHrs YearsCode YearsCodePro
0 1 I am a developer by profession Yes NaN 13 Monthly NaN NaN Germany European Euro EUR Microsoft SQL Server Elasticsearch;Microsoft SQL Server;Oracle Developer, desktop or enterprise applications;Developer, full-stack Master’s degree (M.A., M.S., M.Eng., MBA, etc.) Independent contractor, freelancer, or self-employed White or of European descent Man Languages, frameworks, and other technologies I’d be working with;Remote work options;Opportunities for professional development Slightly satisfied I am not interested in new job opportunities C#;HTML/CSS;JavaScript C#;HTML/CSS;JavaScript .NET Core;Xamarin .NET;.NET Core Microsoft Teams;Microsoft Azure;Trello Confluence;Jira;Slack;Microsoft Azure;Trello No Somewhat important Fairly important NaN NaN Once a year Not sure NaN No Often: 1-2 days per week or more Start a free trial;Ask developers I know/work with Amused Stack Overflow (public Q&A for anyone who codes) Visit Stack Overflow;Go for a walk or other physical activity;Do other work and come back later Windows 2 to 9 employees Android;iOS;Kubernetes;Microsoft Azure;Windows Windows NaN Straight / Heterosexual No No, not at all NaN Multiple times per day Neither easy nor difficult Appropriate in length No Computer science, computer engineering, or software engineering ASP.NET Core ASP.NET;ASP.NET Core Just as welcome now as I felt last year 50.0 36 27
1 2 I am a developer by profession No NaN 19 NaN NaN NaN United Kingdom Pound sterling GBP NaN NaN Developer, full-stack;Developer, mobile Bachelor’s degree (B.A., B.S., B.Eng., etc.) Employed full-time NaN NaN NaN Very dissatisfied I am not interested in new job opportunities Python;Swift JavaScript;Swift React Native;TensorFlow;Unity 3D React Native Github;Slack Confluence;Jira;Github;Gitlab;Slack NaN NaN Fairly important NaN NaN Once a year Not sure NaN No NaN NaN Amused Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers) Visit Stack Overflow;Go for a walk or other physical activity MacOS 1,000 to 4,999 employees iOS;Kubernetes;Linux;MacOS iOS I have little or no influence NaN Yes Yes, definitely Less than once per month or monthly Multiple times per day NaN NaN NaN Computer science, computer engineering, or software engineering NaN NaN Somewhat more welcome now than last year NaN 7 4
2 3 I code primarily as a hobby Yes NaN 15 NaN NaN NaN Russian Federation NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Objective-C;Python;Swift Objective-C;Python;Swift NaN NaN NaN NaN NaN NaN NaN NaN NaN Once a decade NaN NaN No NaN NaN NaN Stack Overflow (public Q&A for anyone who codes) NaN Linux-based NaN NaN NaN NaN NaN Yes Yes, somewhat A few times per month or weekly Daily or almost daily Neither easy nor difficult Appropriate in length NaN NaN NaN NaN Somewhat more welcome now than last year NaN 4 NaN
3 4 I am a developer by profession Yes 25.0 18 NaN NaN NaN Albania Albanian lek ALL NaN NaN NaN Master’s degree (M.A., M.S., M.Eng., MBA, etc.) NaN White or of European descent Man Flex time or a flexible schedule;Office environment or company culture;Opportunities for professional development Slightly dissatisfied I’m not actively looking, but I am open to new opportunities NaN NaN NaN NaN NaN NaN No NaN Not at all important/not necessary Curious about other opportunities;Wanting to work with new technologies NaN Once a year Not sure Yes Yes Occasionally: 1-2 days per quarter but less than monthly NaN NaN Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers) NaN Linux-based 20 to 99 employees NaN NaN I have a great deal of influence Straight / Heterosexual Yes Yes, definitely A few times per month or weekly Multiple times per day NaN NaN No Computer science, computer engineering, or software engineering NaN NaN Somewhat less welcome now than last year 40.0 7 4
4 5 I used to be a developer by profession, but no longer am Yes 31.0 16 NaN NaN NaN United States NaN NaN MySQL;PostgreSQL MySQL;PostgreSQL;Redis;SQLite NaN Bachelor’s degree (B.A., B.S., B.Eng., etc.) Employed full-time White or of European descent Man NaN NaN NaN Java;Ruby;Scala HTML/CSS;Ruby;SQL Ansible;Chef Ansible Github;Google Suite (Docs, Meet, etc) Confluence;Jira;Github;Slack;Google Suite (Docs, Meet, etc) NaN NaN Very important NaN NaN Once a year No NaN Yes NaN Start a free trial;Ask developers I know/work with;Visit developer communities like Stack Overflow;Read ratings or reviews on third party sites like G2Crowd;Research companies that have advertised on sites I visit Hello, old friend Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers);Stack Overflow for Teams (private Q&A for organizations);Stack Overflow Talent (for hiring companies/recruiters) Call a coworker or friend;Visit Stack Overflow;Watch help / tutorial videos;Do other work and come back later;Visit another developer community (please name): Windows NaN Docker;Google Cloud Platform;Heroku;Linux;Windows AWS;Docker;Linux;MacOS;Windows NaN Straight / Heterosexual Yes Yes, somewhat Less than once per month or monthly A few times per month or weekly Easy Too short No Computer science, computer engineering, or software engineering Django;Ruby on Rails Ruby on Rails Just as welcome now as I felt last year NaN 15 8
In [214]:
pd.read_csv('stackoverflow-developer-survey-2020/survey_results_public.csv').tail()
Out[214]:
Respondent MainBranch Hobbyist Age Age1stCode CompFreq CompTotal ConvertedComp Country CurrencyDesc CurrencySymbol DatabaseDesireNextYear DatabaseWorkedWith DevType EdLevel Employment Ethnicity Gender JobFactors JobSat JobSeek LanguageDesireNextYear LanguageWorkedWith MiscTechDesireNextYear MiscTechWorkedWith NEWCollabToolsDesireNextYear NEWCollabToolsWorkedWith NEWDevOps NEWDevOpsImpt NEWEdImpt NEWJobHunt NEWJobHuntResearch NEWLearn NEWOffTopic NEWOnboardGood NEWOtherComms NEWOvertime NEWPurchaseResearch NEWPurpleLink NEWSOSites NEWStuck OpSys OrgSize PlatformDesireNextYear PlatformWorkedWith PurchaseWhat Sexuality SOAccount SOComm SOPartFreq SOVisitFreq SurveyEase SurveyLength Trans UndergradMajor WebframeDesireNextYear WebframeWorkedWith WelcomeChange WorkWeekHrs YearsCode YearsCodePro
64456 64858 NaN Yes NaN 16 NaN NaN NaN United States NaN NaN NaN NaN Senior executive/VP Master’s degree (M.A., M.S., M.Eng., MBA, etc.) Employed full-time NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Very important NaN NaN Once a decade NaN NaN NaN NaN Start a free trial Amused Stack Overflow (public Q&A for anyone who codes) Call a coworker or friend Windows NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Computer science, computer engineering, or software engineering NaN NaN NaN NaN 10 Less than 1 year
64457 64867 NaN Yes NaN NaN NaN NaN NaN Morocco NaN NaN Cassandra;Couchbase;DynamoDB;Elasticsearch;Firebase;IBM DB2;MariaDB;Microsoft SQL Server;MongoDB;MySQL;Oracle;PostgreSQL;Redis;SQLite Cassandra;Couchbase;DynamoDB;Elasticsearch;Firebase;IBM DB2;MariaDB;Microsoft SQL Server;MongoDB;MySQL;Oracle;PostgreSQL;Redis;SQLite NaN NaN Employed full-time NaN NaN NaN NaN NaN Assembly;Bash/Shell/PowerShell;C;C#;C++;Dart;Go;Haskell;HTML/CSS;Java;JavaScript;Julia;Kotlin;Objective-C;Perl;PHP;Python;R;Ruby;Rust;Scala;SQL;Swift;TypeScript;VBA Assembly;Bash/Shell/PowerShell;C;C#;C++;Dart;Go;Haskell;HTML/CSS;Java;JavaScript;Julia;Kotlin;Objective-C;Perl;PHP;Python;R;Ruby;Rust;Scala;SQL;Swift;TypeScript;VBA NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
64458 64898 NaN Yes NaN NaN NaN NaN NaN Viet Nam NaN NaN NaN NaN NaN Primary/elementary school NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
64459 64925 NaN Yes NaN NaN NaN NaN NaN Poland NaN NaN DynamoDB;Elasticsearch;MongoDB;MySQL;PostgreSQL Oracle NaN NaN Employed full-time NaN NaN NaN NaN NaN HTML/CSS;Java;JavaScript HTML/CSS Node.js NaN Github;Gitlab Confluence;Jira;Slack;Microsoft Teams NaN NaN NaN NaN NaN Once a year NaN NaN NaN NaN Start a free trial Hello, old friend Stack Overflow (public Q&A for anyone who codes) Call a coworker or friend;Visit Stack Overflow Windows NaN NaN Linux;Windows NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN Angular;Angular.js;React.js NaN NaN NaN NaN NaN
64460 65112 NaN Yes NaN NaN NaN NaN NaN Spain NaN NaN MariaDB;Microsoft SQL Server MariaDB;Microsoft SQL Server;MySQL;Oracle NaN Other doctoral degree (Ph.D., Ed.D., etc.) Employed full-time NaN NaN NaN NaN NaN C#;HTML/CSS;Java;JavaScript;SQL C#;HTML/CSS;Java;JavaScript;SQL .NET Core;Xamarin .NET;.NET Core Github;Microsoft Teams Github NaN NaN Critically important NaN NaN Once a year NaN NaN NaN NaN Start a free trial;Ask developers I know/work with;Visit developer communities like Stack Overflow;Read ratings or reviews on third party sites like G2Crowd Indifferent NaN Meditate;Visit Stack Overflow;Go for a walk or other physical activity;Watch help / tutorial videos;Do other work and come back later Windows NaN Arduino;Linux;Raspberry Pi;Windows Android;Arduino;Linux;Raspberry Pi;Windows NaN NaN NaN NaN NaN NaN NaN NaN NaN Computer science, computer engineering, or software engineering ASP.NET Core;jQuery Angular;Angular.js;ASP.NET Core;jQuery NaN NaN NaN NaN
In [215]:
pd.read_csv('stackoverflow-developer-survey-2020/survey_results_public.csv').columns
Out[215]:
Index(['Respondent', 'MainBranch', 'Hobbyist', 'Age', 'Age1stCode', 'CompFreq',
       'CompTotal', 'ConvertedComp', 'Country', 'CurrencyDesc',
       'CurrencySymbol', 'DatabaseDesireNextYear', 'DatabaseWorkedWith',
       'DevType', 'EdLevel', 'Employment', 'Ethnicity', 'Gender', 'JobFactors',
       'JobSat', 'JobSeek', 'LanguageDesireNextYear', 'LanguageWorkedWith',
       'MiscTechDesireNextYear', 'MiscTechWorkedWith',
       'NEWCollabToolsDesireNextYear', 'NEWCollabToolsWorkedWith', 'NEWDevOps',
       'NEWDevOpsImpt', 'NEWEdImpt', 'NEWJobHunt', 'NEWJobHuntResearch',
       'NEWLearn', 'NEWOffTopic', 'NEWOnboardGood', 'NEWOtherComms',
       'NEWOvertime', 'NEWPurchaseResearch', 'NEWPurpleLink', 'NEWSOSites',
       'NEWStuck', 'OpSys', 'OrgSize', 'PlatformDesireNextYear',
       'PlatformWorkedWith', 'PurchaseWhat', 'Sexuality', 'SOAccount',
       'SOComm', 'SOPartFreq', 'SOVisitFreq', 'SurveyEase', 'SurveyLength',
       'Trans', 'UndergradMajor', 'WebframeDesireNextYear',
       'WebframeWorkedWith', 'WelcomeChange', 'WorkWeekHrs', 'YearsCode',
       'YearsCodePro'],
      dtype='object')
In [216]:
pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText
Out[216]:
Column
Respondent                                                                                                                                                                                           Randomized respondent ID number (not in order of survey response time)
MainBranch                                                                                                                                                 Which of the following options best describes you today? Here, by "developer" we mean "someone who writes code."
Hobbyist                                                                                                                                                                                                                                            Do you code as a hobby?
Age                                                                                                                                                                            What is your age (in years)? If you prefer not to answer, you may leave this question blank.
Age1stCode                                                                                                                                                      At what age did you write your first line of code or program? (e.g., webpage, Hello World, Scratch project)
                                                                                                                                              ...                                                                                                                          
WebframeWorkedWith    Which web frameworks have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the framework and want to continue to do so, please check both boxes in that row.)
WelcomeChange                                                                                                                                                                                             Compared to last year, how welcome do you feel on Stack Overflow?
WorkWeekHrs                                                                                                                                                                        On average, how many hours per week do you work? Please enter a whole number in the box.
YearsCode                                                                                                                                                                                            Including any education, how many years have you been coding in total?
YearsCodePro                                                                                                                                                                NOT including education, how many years have you coded professionally (as a part of your work)?
Name: QuestionText, Length: 61, dtype: object

By default pandas truncates the occurance of Series which is a column for us as seen above. We can expand it by toggling settings of the IPython console in 3 lines or only for pandas output cell using:

In [217]:
pd.set_option('display.max_colwidth', None)
In [218]:
pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', usecols = ['QuestionText'], index_col='QuestionText')#.QuestionText
Out[218]:
QuestionText
Randomized respondent ID number (not in order of survey response time)
Which of the following options best describes you today? Here, by "developer" we mean "someone who writes code."
Do you code as a hobby?
What is your age (in years)? If you prefer not to answer, you may leave this question blank.
At what age did you write your first line of code or program? (e.g., webpage, Hello World, Scratch project)
...
Which web frameworks have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you both worked with the framework and want to continue to do so, please check both boxes in that row.)
Compared to last year, how welcome do you feel on Stack Overflow?
On average, how many hours per week do you work? Please enter a whole number in the box.
Including any education, how many years have you been coding in total?
NOT including education, how many years have you coded professionally (as a part of your work)?

61 rows × 0 columns

In [219]:
pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText['CompFreq']
Out[219]:
'Is that compensation weekly, monthly, or yearly?'

Above columns are much more readable than their truncated versions. We've now loaded the dataset, and we're ready to move on to the next step of preprocessing & cleaning the data for our analysis.

Data Preparation & Cleaning

While the survey responses contain a wealth of information, we'll limit our analysis to the following areas:

  • Age & Location
  • Programming experience
  • Forum usage

Let's select a subset of columns with the relevant data for our analysis.

In [220]:
stambha = ['Hobbyist', 'SOAccount', 'SOComm', 'SOPartFreq', 'SOVisitFreq', 'NEWSOSites', 'WelcomeChange', 'NEWCollabToolsWorkedWith','NEWOffTopic', 'NEWOtherComms', 'NEWStuck']
In [221]:
len(stambha)
Out[221]:
11

Let's take-out a sub-set of the data from required columns into a new DataFrame and call it as df.

In [222]:
df = pd.read_csv('stackoverflow-developer-survey-2020/survey_results_public.csv')[stambha].copy()
In [223]:
df
Out[223]:
Hobbyist SOAccount SOComm SOPartFreq SOVisitFreq NEWSOSites WelcomeChange NEWCollabToolsWorkedWith NEWOffTopic NEWOtherComms NEWStuck
0 Yes No No, not at all NaN Multiple times per day Stack Overflow (public Q&A for anyone who codes) Just as welcome now as I felt last year Confluence;Jira;Slack;Microsoft Azure;Trello Not sure No Visit Stack Overflow;Go for a walk or other physical activity;Do other work and come back later
1 No Yes Yes, definitely Less than once per month or monthly Multiple times per day Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers) Somewhat more welcome now than last year Confluence;Jira;Github;Gitlab;Slack Not sure No Visit Stack Overflow;Go for a walk or other physical activity
2 Yes Yes Yes, somewhat A few times per month or weekly Daily or almost daily Stack Overflow (public Q&A for anyone who codes) Somewhat more welcome now than last year NaN NaN No NaN
3 Yes Yes Yes, definitely A few times per month or weekly Multiple times per day Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers) Somewhat less welcome now than last year NaN Not sure Yes NaN
4 Yes Yes Yes, somewhat Less than once per month or monthly A few times per month or weekly Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers);Stack Overflow for Teams (private Q&A for organizations);Stack Overflow Talent (for hiring companies/recruiters) Just as welcome now as I felt last year Confluence;Jira;Github;Slack;Google Suite (Docs, Meet, etc) No Yes Call a coworker or friend;Visit Stack Overflow;Watch help / tutorial videos;Do other work and come back later;Visit another developer community (please name):
... ... ... ... ... ... ... ... ... ... ... ...
64456 Yes NaN NaN NaN NaN Stack Overflow (public Q&A for anyone who codes) NaN NaN NaN NaN Call a coworker or friend
64457 Yes NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
64458 Yes NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
64459 Yes NaN NaN NaN NaN Stack Overflow (public Q&A for anyone who codes) NaN Confluence;Jira;Slack;Microsoft Teams NaN NaN Call a coworker or friend;Visit Stack Overflow
64460 Yes NaN NaN NaN NaN NaN NaN Github NaN NaN Meditate;Visit Stack Overflow;Go for a walk or other physical activity;Watch help / tutorial videos;Do other work and come back later

64461 rows × 11 columns

In [224]:
pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText[stambha]
Out[224]:
Column
Hobbyist                                                                                                                                                                                                                                             Do you code as a hobby?
SOAccount                                                                                                                                                                                                                              Do you have a Stack Overflow account?
SOComm                                                                                                                                                                                                    Do you consider yourself a member of the Stack Overflow community?
SOPartFreq                                                                                                                     How frequently would you say you participate in Q&A on Stack Overflow? By participate we mean ask, answer, vote for, or comment on questions.
SOVisitFreq                                                                                                                                                                                                           How frequently would you say you visit Stack Overflow?
NEWSOSites                                                                                                                                                                              Which of the following Stack Overflow sites have you visited? Select all that apply.
WelcomeChange                                                                                                                                                                                              Compared to last year, how welcome do you feel on Stack Overflow?
NEWCollabToolsWorkedWith    Which collaboration tools have you done extensive development work in over the past year, and which do you want to work in over the next year? (If you worked with the tool and want to continue to do so, please check both boxes in that row.)
NEWOffTopic                                                                                                                                                                         Do you think Stack Overflow should relax restrictions on what is considered “off-topic”?
NEWOtherComms                                                                                                                                                                                                    Are you a member of any other online developer communities?
NEWStuck                                                                                                                                                                                              What do you do when you get stuck on a problem? Select all that apply.
Name: QuestionText, dtype: object
In [225]:
df
Out[225]:
Hobbyist SOAccount SOComm SOPartFreq SOVisitFreq NEWSOSites WelcomeChange NEWCollabToolsWorkedWith NEWOffTopic NEWOtherComms NEWStuck
0 Yes No No, not at all NaN Multiple times per day Stack Overflow (public Q&A for anyone who codes) Just as welcome now as I felt last year Confluence;Jira;Slack;Microsoft Azure;Trello Not sure No Visit Stack Overflow;Go for a walk or other physical activity;Do other work and come back later
1 No Yes Yes, definitely Less than once per month or monthly Multiple times per day Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers) Somewhat more welcome now than last year Confluence;Jira;Github;Gitlab;Slack Not sure No Visit Stack Overflow;Go for a walk or other physical activity
2 Yes Yes Yes, somewhat A few times per month or weekly Daily or almost daily Stack Overflow (public Q&A for anyone who codes) Somewhat more welcome now than last year NaN NaN No NaN
3 Yes Yes Yes, definitely A few times per month or weekly Multiple times per day Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers) Somewhat less welcome now than last year NaN Not sure Yes NaN
4 Yes Yes Yes, somewhat Less than once per month or monthly A few times per month or weekly Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers);Stack Overflow for Teams (private Q&A for organizations);Stack Overflow Talent (for hiring companies/recruiters) Just as welcome now as I felt last year Confluence;Jira;Github;Slack;Google Suite (Docs, Meet, etc) No Yes Call a coworker or friend;Visit Stack Overflow;Watch help / tutorial videos;Do other work and come back later;Visit another developer community (please name):
... ... ... ... ... ... ... ... ... ... ... ...
64456 Yes NaN NaN NaN NaN Stack Overflow (public Q&A for anyone who codes) NaN NaN NaN NaN Call a coworker or friend
64457 Yes NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
64458 Yes NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
64459 Yes NaN NaN NaN NaN Stack Overflow (public Q&A for anyone who codes) NaN Confluence;Jira;Slack;Microsoft Teams NaN NaN Call a coworker or friend;Visit Stack Overflow
64460 Yes NaN NaN NaN NaN NaN NaN Github NaN NaN Meditate;Visit Stack Overflow;Go for a walk or other physical activity;Watch help / tutorial videos;Do other work and come back later

64461 rows × 11 columns

Let's view some basic information about the data frame.

In [226]:
df.shape
Out[226]:
(64461, 11)
In [227]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 64461 entries, 0 to 64460
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype 
---  ------                    --------------  ----- 
 0   Hobbyist                  64416 non-null  object
 1   SOAccount                 56805 non-null  object
 2   SOComm                    56476 non-null  object
 3   SOPartFreq                46792 non-null  object
 4   SOVisitFreq               56970 non-null  object
 5   NEWSOSites                58275 non-null  object
 6   WelcomeChange             52683 non-null  object
 7   NEWCollabToolsWorkedWith  52883 non-null  object
 8   NEWOffTopic               50804 non-null  object
 9   NEWOtherComms             57205 non-null  object
 10  NEWStuck                  54983 non-null  object
dtypes: object(11)
memory usage: 5.4+ MB
In [228]:
df.last_valid_index()
Out[228]:
64460

Let's now view some basic statistics about the the numeric columns.

In [229]:
df.describe()
Out[229]:
Hobbyist SOAccount SOComm SOPartFreq SOVisitFreq NEWSOSites WelcomeChange NEWCollabToolsWorkedWith NEWOffTopic NEWOtherComms NEWStuck
count 64416 56805 56476 46792 56970 58275 52683 52883 50804 57205 54983
unique 2 3 6 6 6 61 6 1153 3 2 444
top Yes Yes Yes, somewhat Less than once per month or monthly Daily or almost daily Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics) Just as welcome now as I felt last year Github Not sure No Visit Stack Overflow
freq 50388 47275 15273 20432 17372 22415 37201 4343 20213 33367 2904
In [230]:
df['Hobbyist'].value_counts()
Out[230]:
Yes    50388
No     14028
Name: Hobbyist, dtype: int64
In [231]:
df.sample(10)
Out[231]:
Hobbyist SOAccount SOComm SOPartFreq SOVisitFreq NEWSOSites WelcomeChange NEWCollabToolsWorkedWith NEWOffTopic NEWOtherComms NEWStuck
56615 Yes Yes Neutral A few times per month or weekly A few times per week Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers) Just as welcome now as I felt last year Jira;Github;Gitlab;Microsoft Teams;Trello;Google Suite (Docs, Meet, etc) Not sure No Visit Stack Overflow;Go for a walk or other physical activity;Watch help / tutorial videos;Do other work and come back later
51406 Yes Yes No, not at all Less than once per month or monthly Less than once per month or monthly Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics) Just as welcome now as I felt last year Github;Slack No Yes Call a coworker or friend;Go for a walk or other physical activity
47380 No Yes No, not at all I have never participated in Q&A on Stack Overflow Daily or almost daily Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics) Somewhat more welcome now than last year Confluence;Jira;Github;Gitlab;Slack;Trello Not sure No Meditate;Call a coworker or friend;Visit Stack Overflow;Watch help / tutorial videos;Do other work and come back later
57258 Yes Yes Yes, definitely Less than once per month or monthly Multiple times per day Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics) Just as welcome now as I felt last year Github;Microsoft Teams;Microsoft Azure;Google Suite (Docs, Meet, etc) Not sure Yes Call a coworker or friend;Visit Stack Overflow;Go for a walk or other physical activity;Watch help / tutorial videos;Do other work and come back later
18260 Yes Yes Neutral Less than once per month or monthly Daily or almost daily Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers) Just as welcome now as I felt last year Github;Slack;Trello;Google Suite (Docs, Meet, etc) No Yes Call a coworker or friend;Visit Stack Overflow;Go for a walk or other physical activity;Do other work and come back later
51287 Yes Yes Yes, somewhat Less than once per month or monthly A few times per month or weekly Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics) Just as welcome now as I felt last year Confluence;Github;Gitlab;Slack;Google Suite (Docs, Meet, etc) Yes Yes Play games;Call a coworker or friend;Visit Stack Overflow;Watch help / tutorial videos;Do other work and come back later;Visit another developer community (please name):
47905 Yes Yes Yes, somewhat Less than once per month or monthly Daily or almost daily Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers) Just as welcome now as I felt last year Confluence;Jira;Github;Slack;Microsoft Teams No Yes Call a coworker or friend;Visit Stack Overflow;Go for a walk or other physical activity;Do other work and come back later
33824 Yes Yes No, not at all I have never participated in Q&A on Stack Overflow Multiple times per day Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics) Just as welcome now as I felt last year Jira;Github No No Call a coworker or friend;Visit Stack Overflow;Watch help / tutorial videos;Do other work and come back later
31635 Yes Yes No, not at all I have never participated in Q&A on Stack Overflow Daily or almost daily Stack Overflow (public Q&A for anyone who codes);Stack Overflow Jobs (for job seekers) Just as welcome now as I felt last year Confluence;Jira;Github;Gitlab;Slack;Trello;Google Suite (Docs, Meet, etc) Not sure No Call a coworker or friend;Visit Stack Overflow;Watch help / tutorial videos
9061 Yes Yes Yes, somewhat Less than once per month or monthly Daily or almost daily Stack Overflow (public Q&A for anyone who codes) NaN Github;Gitlab;Microsoft Teams;Trello;Google Suite (Docs, Meet, etc) NaN Yes Meditate;Call a coworker or friend;Visit Stack Overflow;Watch help / tutorial videos

Exploratory Data Analysis

Forum Account

Let's look at the distribution of responses weather a respondent had a forum account or not. It's a well known fact that register users are mire likely to farticipate in forum activities like surveys.

In [232]:
(pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText).SOAccount
Out[232]:
'Do you have a Stack Overflow account?'
In [233]:
user_counts = df.SOAccount.value_counts()
user_counts
Out[233]:
Yes                        47275
No                          6101
Not sure/can't remember     3429
Name: SOAccount, dtype: int64

A pie chart would be a good way to visualize the distribution.

In [234]:
plt.figure(figsize=(20,10))
plt.title((pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText).SOAccount)
plt.pie(user_counts, labels=user_counts.index, autopct='%f%%', startangle=0);

About 83% of survey respondents who have answered the question had an account on the forum.

Hobby

Let's look at the distribution of responses, weather a respondent considered themselves as a hobbyist or not.

In [235]:
(pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText).Hobbyist
Out[235]:
'Do you code as a hobby?'
In [236]:
hobby_counts = df.Hobbyist.value_counts()
hobby_counts
Out[236]:
Yes    50388
No     14028
Name: Hobbyist, dtype: int64
In [237]:
plt.figure(figsize=(20,10))
plt.title((pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText).Hobbyist)
plt.pie(hobby_counts, labels= hobby_counts.index, autopct='%f%%', startangle=0);

It appears that four in five of the respondents have taken up programming as a hobby and not professionally.

Let's also plot the visit-frequency, but this time we'll convert the percentages into numbers, and sort by percentage values to make it easier to visualize the order.

In [238]:
(pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText).SOVisitFreq
Out[238]:
'How frequently would you say you visit Stack Overflow?'
In [239]:
VisitFreq_pct = df.SOVisitFreq.value_counts() #* 100 / df.SOVisitFreq.count()
sns.barplot(VisitFreq_pct, VisitFreq_pct.index)
plt.title((pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText).SOVisitFreq)
plt.ylabel(None);
plt.xlabel('Percentage');

It turns that 55% of respondents visit our forum atleast once daily - which is very encouraging. This seems to suggest that user retention is 30%/day, meaning 30% of those who have visited our forum today will probably return tommorrow.

On the flip side, this entails selection bias. Our respondent may not be representative of the average person who uses our forum because he did not stick around to take this survey. Only those who think our forum is great are responding to our call.

In [240]:
plt.figure(figsize=(20,10))
plt.title((pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText).SOVisitFreq)
plt.pie(VisitFreq_pct, labels= VisitFreq_pct.index, autopct='%f%%', startangle=0);

Community feel

There are various reasons why a member of a forum may feel excluded:

  • Echo chamber, bee hive mind that encorages groupthink and shuts-down dissenting views
  • Observer Effect, 'observing the process changes the process.' on a public online forum, the thought that all their activity can be traced and tracked.
  • Language barriers
  • Ettiquette and socially acceptable behavioural boundries.

Let's visualize the data from SOComm column.

In [241]:
(pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText).SOComm
Out[241]:
'Do you consider yourself a member of the Stack Overflow community?'
In [242]:
(df.SOComm.value_counts(normalize=True, ascending=True)*100).plot(kind='barh', color='g')
plt.title((pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText).SOComm)
plt.xlabel('Percentage');

It appears that close to 35% of respondents don't want to identify themselves with the StackOverflow label.

The NEWSOSites field contains information about the new topic-specific forums being launched. Since the question allows multiple answers, the column contains lists of values separated by ;, which makes it a bit harder to analyze directly.

In [243]:
(pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText).NEWSOSites
Out[243]:
'Which of the following Stack Overflow sites have you visited? Select all that apply.'
In [244]:
df.NEWSOSites.value_counts()
Out[244]:
Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics)                                                                                                                                                                22415
Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers)                                                                                                                          13891
Stack Overflow (public Q&A for anyone who codes)                                                                                                                                                                                                                    12762
Stack Overflow (public Q&A for anyone who codes);Stack Overflow Jobs (for job seekers)                                                                                                                                                                               4588
Stack Overflow (public Q&A for anyone who codes);Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers);Stack Overflow for Teams (private Q&A for organizations)                                                                   906
                                                                                                                                                                                                                                                                    ...  
Stack Exchange (public Q&A for a variety of topics);Stack Overflow for Teams (private Q&A for organizations);Stack Overflow Advertising (for technology companies)                                                                                                      2
Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers);Stack Overflow for Teams (private Q&A for organizations);Stack Overflow Talent (for hiring companies/recruiters);Stack Overflow Advertising (for technology companies)        2
Stack Exchange (public Q&A for a variety of topics);Stack Overflow Jobs (for job seekers);Stack Overflow Talent (for hiring companies/recruiters);Stack Overflow Advertising (for technology companies)                                                                 2
Stack Exchange (public Q&A for a variety of topics);Stack Overflow Advertising (for technology companies)                                                                                                                                                               1
Stack Exchange (public Q&A for a variety of topics);Stack Overflow for Teams (private Q&A for organizations)                                                                                                                                                            1
Name: NEWSOSites, Length: 61, dtype: int64

Let's define a helper function which turns a column containing lists of values (like df.NEWSOSites) into a data frame with one column for each possible option.

In [245]:
def sandhi_vicched(col_series):
    result_df = col_series.to_frame()
    options = []
    # Iterate over the column
    for idx, value  in col_series[col_series.notnull()].iteritems():
        # Break each value into list of options
        for option in value.split(';'):
            # Add the option as a column to result
            if not option in result_df.columns:
                options.append(option)
                result_df[option] = False
            # Mark the value in the option column as True
            result_df.at[idx, option] = True
    return result_df[options]
In [246]:
NEWSOSites_df = sandhi_vicched(df.NEWSOSites)
In [247]:
NEWSOSites_df
Out[247]:
Stack Overflow (public Q&A for anyone who codes) Stack Exchange (public Q&A for a variety of topics) Stack Overflow Jobs (for job seekers) Stack Overflow for Teams (private Q&A for organizations) Stack Overflow Talent (for hiring companies/recruiters) Stack Overflow Advertising (for technology companies) I have never visited any of these sites
0 True False False False False False False
1 True True True False False False False
2 True False False False False False False
3 True True True False False False False
4 True True True True True False False
... ... ... ... ... ... ... ...
64456 True False False False False False False
64457 False False False False False False False
64458 False False False False False False False
64459 True False False False False False False
64460 False False False False False False False

64461 rows × 7 columns

The NEWSOSites_df has one column for each option that can be selected as a response. If a responded has selected the option, the value in the column is True, otherwise it is false.

We can now use the column-wise totals to identify the most popular forums.

In [248]:
NEWSOSites_totals = NEWSOSites_df.sum().sort_values(ascending=False)
NEWSOSites_totals
Out[248]:
Stack Overflow (public Q&A for anyone who codes)            57114
Stack Exchange (public Q&A for a variety of topics)         39219
Stack Overflow Jobs (for job seekers)                       21126
Stack Overflow for Teams (private Q&A for organizations)     2631
Stack Overflow Talent (for hiring companies/recruiters)      1417
Stack Overflow Advertising (for technology companies)         837
I have never visited any of these sites                       528
dtype: int64

As one might expect, the most popular forums is "Stack Overflow", the first of its name.

In [249]:
sns.heatmap(NEWSOSites_df)
Out[249]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f9dc74ac7f0>

Note that here cream colour means more users picking that forum option and black line means that respondent has not selected that forum as his pick. Heatmaps are only possible when we have a DataFrame which is rectangular/square shaped. Since our NEWSOSites_df was a boolean matrix, only two colours are present in the heatmap.

Asking and Answering Questions

We've already gained several insights about the respondents and the programming community in general, simply by exploring individual columns of the dataset. Let's ask some specific questions, and try to answer them using data frame operations and interesting visualizations.

What do forumites do when stuck on a programming problem?

Let's look at the responses in the survey.

In [250]:
NEWStuck_df = sandhi_vicched(df.NEWStuck)
NEWStuck_numbers = NEWStuck_df.mean().sort_values(ascending=False)
NEWStuck_numbers
Out[250]:
Visit Stack Overflow                                0.772607
Do other work and come back later                   0.464048
Watch help / tutorial videos                        0.450040
Call a coworker or friend                           0.425451
Go for a walk or other physical activity            0.369215
Play games                                          0.128217
Meditate                                            0.099905
Panic                                               0.093250
Visit another developer community (please name):    0.087495
dtype: float64
In [251]:
NEWStuck_df.sum()
Out[251]:
Visit Stack Overflow                                49803
Go for a walk or other physical activity            23800
Do other work and come back later                   29913
Call a coworker or friend                           27425
Watch help / tutorial videos                        29010
Visit another developer community (please name):     5640
Play games                                           8265
Meditate                                             6440
Panic                                                6011
dtype: int64

We can visualize this information using a bar chart.

In [252]:
sns.set_style('darkgrid')
plt.figure(figsize=(20,6))
plt.title((pd.read_csv('stackoverflow-developer-survey-2020/survey_results_schema.csv', index_col='Column').QuestionText).NEWStuck)
sns.barplot(NEWStuck_numbers.index, NEWStuck_numbers);
plt.xticks(rotation = '45')
Out[252]:
(array([0, 1, 2, 3, 4, 5, 6, 7, 8]),
 <a list of 9 Text major ticklabel objects>)

For this we can can use the NEWCollabToolsWorkedWith column, with similar processing as the previous one.

In [253]:
NEWCollabToolsWorkedWith_df = sandhi_vicched(df.NEWCollabToolsWorkedWith)
NEWCollabToolsWorkedWith_percentages = NEWCollabToolsWorkedWith_df.mean().sort_values(ascending=False) * 100
NEWCollabToolsWorkedWith_percentages
Out[253]:
Github                            67.926343
Slack                             43.465041
Jira                              39.127534
Google Suite (Docs, Meet, etc)    34.054700
Gitlab                            30.320659
Confluence                        26.561797
Trello                            24.286002
Microsoft Teams                   20.970820
Microsoft Azure                   12.176355
Stack Overflow for Teams           4.742402
Facebook Workplace                 2.451094
dtype: float64
In [254]:
plt.figure(figsize=(12, 12))
sns.barplot(NEWCollabToolsWorkedWith_percentages, NEWCollabToolsWorkedWith_percentages.index)
plt.title("Colaboration Tools");
plt.xlabel('count');

Once again, it's not surprising that GitHub is the version control tool most people are interested in using - since it is an easy-to-learn and also the most popular.

However, when we want to see the market-share of each tool, it is better to use the pie chart:

In [255]:
plt.figure(figsize=(20,20))
plt.title("Market-Share of Collabaration-Tools")
plt.rcParams['font.size'] = 25.0
plt.pie(NEWCollabToolsWorkedWith_percentages, labels=NEWCollabToolsWorkedWith_percentages.index, autopct='%f%%', startangle=0);

What is the percent of weekly active users among respondents?

To answer, this we can use the SOVisitFreq column.

In [256]:
df.SOVisitFreq               
Out[256]:
0                 Multiple times per day
1                 Multiple times per day
2                  Daily or almost daily
3                 Multiple times per day
4        A few times per month or weekly
                      ...               
64456                                NaN
64457                                NaN
64458                                NaN
64459                                NaN
64460                                NaN
Name: SOVisitFreq, Length: 64461, dtype: object

First, we'll count number of occurences of unique values.

In [257]:
SOVisitFreq_df = df.SOVisitFreq.value_counts()
In [258]:
SOVisitFreq_df
Out[258]:
Daily or almost daily                                 17372
Multiple times per day                                16273
A few times per week                                  13493
A few times per month or weekly                        7901
Less than once per month or monthly                    1739
I have never visited Stack Overflow (before today)      192
Name: SOVisitFreq, dtype: int64

It appears that a total of 6 options were included. Let's aggregate these to identify the percentage of respondents who selected each options.

In [259]:
SOVisitFreq_percentages = (SOVisitFreq_df.sort_values(ascending=False) * 100) /SOVisitFreq_df.sum()
SOVisitFreq_percentages
Out[259]:
Daily or almost daily                                 30.493242
Multiple times per day                                28.564157
A few times per week                                  23.684395
A few times per month or weekly                       13.868703
Less than once per month or monthly                    3.052484
I have never visited Stack Overflow (before today)     0.337019
Name: SOVisitFreq, dtype: float64

We can plot this information using a horizontal bar chart.

In [260]:
plt.figure(figsize=(20, 10))
sns.barplot(SOVisitFreq_percentages, SOVisitFreq_percentages.index)
plt.title("Forum vists in given time frame");
plt.xlabel('Percentage');

Perhaps not surprisingly, 55%+ of the respondents are daily active users of the forum.

How often do hobbyists who are also part of other online communities visit our forum community?

In [261]:
df.groupby([df.SOVisitFreq, df.NEWOtherComms])['Hobbyist'].count().unstack().plot.barh(figsize=(20,20), stacked=True, fontsize=18)
plt.show();

How welcoming do visitors find our forum vis-a-vis other forums, especially regarding the request to let 'off-topic' posts stay on the forum?

In [262]:
df.groupby([df.WelcomeChange, df.NEWOffTopic])['NEWOtherComms'].count().unstack().plot.barh(figsize=(20,20), stacked=True, fontsize=18)
plt.show();

Conclusion

We've drawn many interesting inferences from the survey, here's a summary of the few of them:

  • Having account on other forums does not affect user-retention.

This finding goes against what would be condisdered intuition. If a user also uses or has an account on other forums, then that does not neccessarily mean s/he will be spending less part of the day catching-up with your forum.

  • GitHub is by far the most-widely used collabaration tool by developers. Is this only because of the 'first mover advantage' or 'network effect'? This can be the topic of further investigation.

  • The joint-probability of a person having opened an occount AND responding to our survey is 0.83

  • A significant percentage of programmers either come to the forums when they get stuck or watch a video/tutorial.

In [263]:
# Select a project name
project='self-reported-user-engagement'
# Install the Jovian library
!pip install jovian --upgrade --quiet
import jovian
jovian.commit(project=project)
[jovian] Detected Colab notebook...
[jovian] Uploading colab notebook to Jovian...
[jovian] Committed successfully! https://jovian.ml/vedant-madane/self-reported-user-engagement
Out[263]:
'https://jovian.ml/vedant-madane/self-reported-user-engagement'
In [263]: