Original post published here.
Recently, I was lucky enough to score a free ticket for the Open Data Science Conference 2018 in Boston from Women Who Code. I got back from the conference two days ago, and here is my summary of the conference highlights, together with projects I’m inspired to replicate this month.
Most of the talks were amazing and it was really hard to make a choice on what to attend. I attended more than 10 talks and workshops, but I learned the most from the following three speakers.
The presentation provided me with enough information to replicate some examples on my own. I now know that I can use a publicly available dataset with Enron emails, eliminate its metadata, remove stop words, and map words to their base. Then, I can run a statistical model and see which words/topics are most frequently used in the dataset. Essentially, I learned the basic steps to algorithmically analyze large sets of documents, comments, or other text files.
Project Feels: Deep Text Models for Predicting the Emotional Resonance of New York Times Articles by Alexander Spangher.
Alex’s ability to captivate and connect with the audience was a sight to behold. The whole talk felt like an informal conversation between the presenter and 150+ people in the audience. That’s definitely a skill and a bit of a talent to manage such a big crowd in a very conversational way, encouraging questions and sparking curiosity.
The Project Feels aims to predict the emotional effect of NYT articles on readers with the goal of recommending relevant articles or ads. The initial dataset was obtained with the help of Amazon Mechanical Turks who tagged about 20,000 articles based on the emotions these articles evoked such as boredom, interest, love, fear, etc. The talk was very structured and provided me with a good understanding of how to approach a data question and what tools to use.
From Numbers to Narrative: Data Storytelling by Issac Reyes
This talk provided a summary of best practices for data visualization. It contained a lot of interesting examples of data charts from popular media and even the speaker’s dating life. Issac referenced the data visualization hero Edward Tufte and the Gestalt school with its laws of similarity, proximity, and enclosure. A fun formula presented was Data-Ink Ratio = Data Ink / Total Ink Used to Produce a Graphic. Ideally, the ratio should be close to 1. That means all the ink used to produce a graphic is used to depict the data within the graphic rather than coloring the background or adding other nonfunctional embellishments. This formula reminded me that above all, visualizations should show and not hide data. The talk was a great refresher on the core design principles to keep in mind while reporting data.