Best Fit Scrabble Letter Distribution

« previous entry | next entry »
Jul. 15th, 2006 | 06:17 am

I finally got around to writing a script to find the best Scrabble letter distribution, relative to some optimum. The algorithm used is simple and greedy, but I'm pretty sure it's correct (exchange argument).

I placed the constraint that the number of tiles must be a positive integer. Without the positive constraint, J, Q, and X would be on zero tiles. Also, note that there are 98 tiles with letters on them (two are blank). I forgot about this in my last post, so the Scrabble distributions were off by a wee bit. They've been corrected for this post, since there's actual calculations in addition to graphs.

Interestingly, the same distribution is a best fit for both the TWL and SOWPODS dictionaries. Anywho, here's the mean squared error (in tiles) relative to each dictionary:
Scrabble (TWL): 1.872912076874
Best Fit (TWL): 0.171270126304604
Scrabble (SOWPODS): 1.90119097345442
Best Fit (SOWPODS): 0.166453047573442

The Scrabble distribution is based on a corpus (New York Times front page) distribution that was done by hand. It hasn't been changed since it was created in 1938.

Here's some new graphs with the best fit included:




Finally, the best fit distribution is:
11 tiles: E
9 tiles: I, S
7 tiles: A, R
6 tiles: N, O, T
5 tiles: L
4 tiles: C
3 tiles: D, G, M, P, U
2 tiles: B, H
1 tiles: F, J, K, Q, V, W, X, Y, Z

Link | Leave a comment | Add to Memories | Tell a Friend

Comments {1}

Scrabble distribution

from: anonymous
date: Jan. 8th, 2007 09:50 pm (UTC)
Link

ok, very very interesting. Being a Greek scrabble player and champion, I am trying these days to suggest a new scrabble letters distribution and values.
Let's say I have the correct frequencies of letters I want to use. How will I decide on the distribution and values at the same time?
The simple way (12% for A is 12 As with a value of 1, 1,3% Z...) does not seem to go very well).
Thanx

Reply | Thread