Write a standalone program that will display (using ASCII art) a histogram of word frequencies from one or more text files (or from stdin
).
Input format.
Input format is simply plain UTF8 encoded text. Either supplied by one or more file specified as command line arguments or as standard input if no command line arguments are specified.
How to break up words.
Punctuation that isn't part of a word should be disregarded. Punctuation part of a word is significant. Words should be considered case insensitive
ie. "Dogs don't sit on children, they sit on cats." => ["dogs", "don't", "sit", "on", "children", "the", "sit", "on", "cats"]
ie. "Dogs DOGS Child ChIlD Hello" => [("dogs", 2), ("child", 2), ("hello", 1)]
We aren't trying to be tricky here so we won't be giving you strange text files to parse with things like 'H!??@!LLO' in it.
Output format.
Should output the words in decreasing order of frequency. Format should be as follows:
the ##########################
dog ########################
sat #######################
child ##################
a #############
one ######
ok #
Notice that:
Don't use unsafePerformIO or other such unsafe functions.
You should submit your work in the form of a URL to a publicly accessible git
repository, which should include the source and a README
file that tells us:
Who you are.
How to build and run your program.
We will award monetary bonuses in the form of Quatloos, the currency of the planet Triskelion, to those who achieve one or more of the following accomplishments:
If we can build your project using the standard cabal
build tool.
We find your source code to be structured and commented in such a way that someone other than you can quickly figure out what is going on.
We run the hlint
tool over your source code and it reports nothing that raises our eyebrows.
You submit a program that we can measure as performing awesomely on a (respectably large) data set of our choosing. Criteria for awesomeness will include such things as performance and memory footprint.
The winner of the most quatloos gets a special edition JPEG of a pony.