What I learned from my very first Kaggle competition: The Titanic Dataset

Image for post
Image for post
Unknown author, Public domain, via Wikimedia Commons

This time ranging from around Christmas to the first or second week of the new year is usually reserved for learning and improving a skill, or for trying out something new. Last year I read up on deep learning and got a basic understanding of how it works. Unfortunately, it is still too far away from my current job reality, so I left it at that. I was still happy about the new knowledge though.

Starting last summer, I focussed more and more on the journey of becoming a better Data Scientist. I had recently changed into the Business Intelligence…


Image for post
Image for post

In my last article I promised the reader that I’ll create a dashboard to track the spread of the corona virus in my homeland as well as for other European countries. The result was a R script that gathers data from the European Center for Disease Control and Prevention to create some static ggplot2 outputs.

While these graphs helped me to answer the questions I had, they don’t really constitute to something that deserves to be called“dashboard. Furthermore, since they are static, they might answer my question but not yours, for example if you’re interested in a different time period…


Visualize data provided by the European Center for Disease Prevention and Control using ggplot2 and R.

Image for post
Image for post

At the time of writing this article I’m stuck in the same situation as more or less all human beings right now. The weather is fine, it smells like spring and we are still allowed to go outside as individuals, but since all social gatherings are forbidden and restaurants, cafes, and stores for non-essential items are closed down, social life is basically non-existent in the offline world. …


My journey and exploration process on one cold winter night.

Image for post
Image for post

So it’s that time of the year again. It’s freezing outside, days are getting shorter and shorter, you feel like all you do is living in a world where the sun is on vacation somewhere south of wherever you’re staying right now. You find yourself asking: what’s the limit here? How many layers of socks and pullovers do I need to wear to not be freezing?

I love this time.

Why? Because when you’re forced to stay inside, you get bored. And when you’re bored, you get ideas. At least I’m like that. …


Image for post
Image for post

When it comes to data analytics there are my reasons to move from your local computer to the cloud. Most prominently, you can run an indefinite number of machines without needing to own or maintain them. Furthermore, you can scale up and down as you wish in a matter of minutes. And if you choose to run t2.micro servers you can run for 750 hours a month for free within the first 12 months! After that it’s a couple of bucks per month and server.

Alright, let’s get to it then! Understandably you won’t have time to read a ten…


How and Why Influencers Artificially Up Their Engagements through Fake Engagement Communities

Image for post
Image for post

You want to be famous? Or at least appear to be popular in front of your friends? Back when I went to school in the mid 2000s, the perceived size of the circle of friends was an indicator of one’s popularity. When you talk to teenagers today it seems like they found a better way to gauge just how popular someone is: their number of Followers on Instagram.

There are shortcuts to “success” if that is how you define it. While buying bot Followers and engagements is the easiest way, it is also the least sustainable. …


The role of tie strength and network degrees in determining the power of social influence.

Image for post
Image for post

Almost any article about the analysis of social networks and the flow of information starts by mentioning Stanley Milgram’s famous small-world experiments of the 1960s. Let’s follow suit.

The first half of this article is about the main findings of classic research on information diffusion through social networks. Afterwards we will focus on a special kind of social networks: the online kind. Do we find the same characteristics of offline social networks on social media platforms like YouTube, Twitter, and Facebook?

Note: This article is part of a series on social network analysis and influencer marketing, all based on the…


Image for post
Image for post

Done and gone? Many things have happened since I’ve shared my InstaCrawlR code with the internet. I wrote my Master thesis on influencer fraud (more on that soon), finished university, and started my career at the world’s beauty company #1.

The world does not stand still. Things change. And so does Instagram and its codebase. Close to a hundred people visited my GitHub repository and tried InstaCrawlR just to find that it stopped working.

Last week, an aspiring PhD student from Spain contacted me via mail. He pointed my nose to an error message.


Monitor Friends, Influencers, and Brands using InstaCrawlR Scripts

Social Media Monitoring can be tough, especially when the number of channels, profiles, and posts to keep an eye on grows. In many companies, data collection is a job typically done either by interns or specialized external service providers. Small and Medium-sized Enterprises as well as private individuals, like bloggers and influencers, often need to do it manually by copy & pasting relevant data from social media to Excel sheets.

This is boring.

If you ever had to do some task like that manually, you know how exhausting and uninspiring it can be. …


Analyzing Brand Positioning as Perceived by Instagram Users

Please note the March 2019 Update: https://medium.com/@jonas.schroeder1991/update-instacrawlr-still-crawling-6500cd376ea3

A couple of weeks ago, the social media platform Instagram announced that their userbase has reached the threshold of 1B. What originally started as a photo-sharing App for amateur mobile photographers in 2010 has developed into the place-to-be for anyone interested in beauty, fashion, health, lifestyle — basically anything of relevance in today’s world.

From a researcher’s perspective, the huge amount of textual and visual user-generated content can offer very interesting insights. In order to work with such large sets of data, computer programs become handy. …

Jonas Schröder

Data Analyst on his way to become a Data Scientist.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store