« Kata Sixteen: Business Rules | Main | Kata Thirteen: Counting Code Lines »

January 28, 2007

TrackBack

TrackBack URL for this entry:
http://www.typepad.com/services/trackback/6a00d83451c41c69e200d8353d46a953ef

Listed below are links to weblogs that reference Kata Fourteen: Tom Swift Under Milk Wood:

Comments

Chris

My girlfriend and I just finished a programming exercise where we each tried to write an n-gram generator (done before I saw this, but interesting that you use it)

An interesting problem with this program is what to do when your object hierarchy gets larger than your available main memory, and you have to go back and break your storage mechanism into smaller chunks so that you can save and load words from harddisk as needed by the generator.
Some things to consider are the size of your chunks (smaller means more lookups, but less space taken up, but larger means more likely to have repeat lookups before having to go fetch again), and indexing structure (alphabetically close words aren't particularly likely to follow eachother, but finding another algorithm for choosing an indexing structure for your words could be complex), and actual storage medium (Dependent on chunk size, plaintext or some structured text such as XML might be quicker to read through)

(To be fair, my n-gram generator without indexing could hold 100 MB worth of text parsed over a 5-gram tree before it decided it didn't want to live..)

The comments to this entry are closed.