Monday, December 3, 2012

Seriously, What's a Data Scientist? (and The Newgrounds Scrape)

So here's the thing. I wouldn't feel comfortable calling myself a data scientist (yet).

Whenever someone mentions the term data science (or, god forbid BIG DATA, without a hint of skepticism or irony) people inevitably start talking about the elephant in the room (see what I did there)?

And I don't know how to ride elephants (yet).

Some people (like yours truly, as just explained) are cautious - "I'm not a data scientist. Data science is a nascent field. No one can go around really calling themselves a data scientist because no one even really knows what data science is yet, there isn't a strict definition." (though Wikipedia's attempt is noble).

Other people are not cautious at all - "I'm a data scientist! Hire me! I know what data are and know how to throw around the term BIG DATA! I'm great with pivot tables in Excel!!"

Aha ha. But I digress.

The point is that I've done the first real work which I think falls under the category of data science.

I'm no Python guru, but threw together a scraper to grab all the metadata from Newgrounds portal content.

The data are here if you're interested in having a go at it already.

The analysis and visualization will take time, that's for a later article. For now, here's one of my exploratory plots, of the content rating by date. Already we can gather from this that, at least at Newgrounds, 4-and-half stars equals perfection.

Sure feels like science.


No comments:

Post a Comment