Data scraping: KDnuggets.com’s post statistics

A beautiful sight

Introduction

Kdnuggets.com is a community dedicated to AI, Analytics, Big Data, Data Mining, Data Science, and Machine Learning. The posts here, regardless of being authentic or collected from other websites, are usually in top-notch quality.

This dataset attempts to encapsulate information about the statistics of their posts, from post name, author, published date, to the number of shares and comments, etc. The dataset contains posts that were published from 2013 until April 2020.

Without KDnuggets.com, this dataset wouldn’t have existed.

Format

1 csv file with 9689 rows and 10 columns.

Download

Download from my public repository here.

Column description

Most of the column names are self-explanatory:

  • post_name.
  • post_date: only the year and month are recorded.
  • author: the author’s name and title, if applicable.
  • tags: e.g. Algorithms, Python, AI.
  • post_type: e.g. Opinions, Tutorials, News.
  • words: the number of words in the post.
  • images: the number of images in the post.
  • comments: the number of comments.
  • responses: the number of responses with emojis.
  • shares: the number of shares (e.g. on Facebook, Twitter).

Demonstrations

Wordcloud of post titles:

Title Wordcloud
Title Wordcloud

Histograms of some post statistics:

Statistics Histogram

Post counts every month from 01/2013 up to now:

Number Of Posts By Month
Number of posts by month

The most productive authors: number of posts and shares

Productive Authors
List of the most active authors

Leave a Reply