Rough estimation is a useful talent to possess. As you’re coding away, you may suddenly need to work out approximately how big a data structure will be, or how fast some loop will run. The faster you can do this, the less the coding flow will be disturbed.
So this is a simple kata: a series of questions, each asking for a rough answer. Try to work each out in your head. I’ll post my answers (and how I got them) in a week or so.
How Big?
- roughly how many binary digits (bit) are required for the unsigned
representation of:
- 1,000
- 1,000,000
- 1,000,000,000
- 1,000,000,000,000
- 8,000,000,000,000
- My town has approximately 20,000 residences. How much space is required to store the names, addresses, and a phone number for all of these (if we store them as characters)?
- I’m storing 1,000,000 integers in a binary tree. Roughly how many nodes and levels can I expect the tree to have? Roughly how much space will it occupy on a 32-bit architecture?
How Fast?
- My copy of Meyer’s Object Oriented Software Construction has about 1,200 body pages. Assuming no flow control or protocol overhead, about how long would it take to send it over an async 56k baud modem line?
- My binary search algorithm takes about 4.5mS to search a 10,000 entry array, and about 6mS to search 100,000 elements. How long would I expect it to take to search 10,000,000 elements (assuming I have sufficient memory to prevent paging).
- Unix passwords are stored using a one-way hash function: the original string is converted to the ‘encrypted’ password string, which cannot be converted back to the original string. One way to attack the password file is to generate all possible cleartext passwords, applying the password hash to each in turn and checking to see if the result matches the password you’re trying to crack. If the hashes match, then the string you used to generate the hash is the original password (or at least, it’s as good as the original password as far as logging in is concerned). In our particular system, passwords can be up to 16 characters long, and there are 96 possible characters at each position. If it takes 1mS to generate the password hash, is this a viable approach to attacking a password?
My answers:
* 1,000 - 10 bits (2^10 is 1024)
* 1,000,000 - 20 bits (it's 1,000 * 1,000 and in bits the multiplication is done by shifting, so when you multiply by 10 bit number, you can shift at most 10 bits to the left)
* 1,000,000,000 - 30 bits
* 1,000,000,000,000 - 40 bits
* 8,000,000,000,000 - 43 bits
Residences: assuming 200 characters per one residence it's 4,000,000 characters or about 4MB
Binary tree: it depends if the numbers are stored in all nodes or only in the leaves. In the former case, there will be obviously 1,000,000 nodes and there will be at least 20 levels (more if the tree is not well balanced). If the number are only in the leaves, there will be 1,000,000 leaves and at least the same number of non-leaf nodes (assuming the tree is well balanced, there must be half of leaves number on the level just above the leaves, then half of that number on the level above that etc.) There will be at least 21 levels in this case. As for the space, we need the number, the left pointer and the right pointer, each 32 bits, which is 12 bytes, so it's 12,000,000 bytes. But since data structures are usually 16-bytes aligned nowadays, it'll rather be 16,000,000 bytes.
Modem: 1200 pages * 60 lines * 70 chars is about 5 million chars. 56kbit/s is 7 kbyte/s so it will take about 700 seconds or almost 12 minutes.
Binary search: for 10,000 elements it should do 14 comparisons max, so that gives us 4.5 / 14 = 0.32 ms per comparison. For 100,000 it's 17 comparisons and 6 / 17 = 0.35 ms. For 10,000,000 we need 24 comparisons max, so we can expect 24 * 0.35 = 8.4 ms.
Passwords: there are 96^16 possible passwords which is close to 100^16 = 10^32. Since we check 10^3 passwords per second we need 10^29 seconds which is 10^21 years. So this method is probably not viable :)
Posted by: szeryf | August 25, 2010 at 05:40 AM
I think a little more than a week has passed :)
Have you posted your answers somewhere?
Regards, Mattias
Posted by: Mattias | January 08, 2010 at 04:50 PM
Hi Dave,
I'm enjoying the mental exercise these kata are causing me - it's reviving old old memories from college! Thanks for putting this together.
Just one picky point: milliseconds is ms not mS (milliSiemen)
Cheers,
Michael
Posted by: - | May 10, 2007 at 05:36 AM