Saturday, February 8, 2014

Creepypasta - Votes vs. Rating (& learning ggplot2)

Excel:


R, base package:


R, ggplot:


Am I overfitting? Probably.


Code:
More fun stuff to come....

References

Source data at Creepypasta.com:

Code on gist:
http://gist.github.com/mylesmharrison/8886272

Creepypasta -  in list of internet phenomena (Wikipedia):
http://en.wikipedia.org/wiki/Creepypasta#Other_phenomena

2 comments:

  1. You might notice that the quadratic fit in the ggplot2 version is different from the other two. That is because you give ggplot the log-transformed data as the response variable so the lm fit is on the log data, not the original data. To more closely reproduce the first two examples use coord_trans(), e.g.

    > gplot <- ggplot(data, aes(Rating, Votes)) +
    + geom_point(col=rgb(0,0,1,0.25), pch=16, cex=2) +
    + geom_smooth(method="lm", formula=y~poly(x,2)) +
    + labs(title="Creepypasta Stories, Votes vs. Ratings") +
    + theme_bw() + coord_trans(y = "log10") +
    + theme(axis.text=element_text(size=14), axis.title=element_text(size=14), plot.title=element_text(size=16, face="bold"))
    > gplot

    See http://docs.ggplot2.org/current/coord_trans.html for some discussion of this.

    ReplyDelete
    Replies
    1. Thanks Kent, you are correct - I missed that (haste makes waste). I think the transformed is probably better anyhow, but here is the other for visual comparison with the first two examples:

      http://4.bp.blogspot.com/-JkTHxA576iA/Uve31ROd2WI/AAAAAAAABgo/RD4kjwo4OBk/s1600/gplot2.png

      Delete