Simple Code


code submitted in attempting to solve the problem

Write a program in one of the languages C++, Java, Perl, Python, PHP to solve the problem described below. Your solution should solve the problem simply, without being a stepping-stone to a more general solution or to a different problem. Ideally your code will leverage the language and libraries that are part of the language, but be readable and largely comprehensible by those who don't fully know the language you're using. You must use only those libraries that are part of a standard distribution and your solution must be contained in a single file, e.g., one .cpp, .java, .pl, .py, .php file.

Email your solution as an attachment or inline to ola AT cs DOT duke DOT edu and please include a note as to which language (including those not in the list above) you consider your current native programming language. This is the language you've been writing more code in than others in the past few months and which you view as possibly influencing the coding and design decisions you make in writing programs.

Email your code/solution by Thursday, March 22, 11:00 pm EDT. If you got to this page via the dulug list or from someone on that list please include the word DULUG somewhere in your email, otherwise I'll assume you arrived here (in)directly via the SIGCSE list.

Write a program that reads from a file whose name/path is a command-line argument to the program (except for PHP, see below). The program should read white-space delimited strings/words from the specified file and should print an ordered list of the different words found in the file. Words should be considered different in a case-insensitive manner, e.g., ENERGY, energy, and Energy are three occurrences of the same word. The output of your program, printed to standard out (except for PHP, see below) should print every different word and the number of times it occurs in the file processed by the program. The number of occurrences should be printed first, followed by a tab character, followed by the word. Each count/word pair should be printed on a different line. Words should be printed in lower-case, ordered by frequency, the most frequently occurring word first. Words with the same frequency should be printed in lexicographical order. See the sample output below.

Words are contiguous (non white-space) characters separated by whitespace. Whitespace has its traditional meaning of space, tab, newline, and so on, e.g., \s using regular expressions. Input in many languages will use this definition by default in reading words, e.g., using >> in streams with C++, using a Scanner and next in Java, calling split in Python, and so on.

The file poe.txt (Edgar Allen Poe's The Cask of Amontillado from Project Gutenberg) should generate this output as the input to the program you write.

Perl and Python programs will be run as scripts --- they will be run from a Unix-like shell as shown below for Perl and Python scripts processing a file named poe.txt. You don't need to submit complete scripts if that's an issue, we'll add the proper directive at the top (e.g., #!/usr/bin/perl on our system for Perl programs) and make the scripts executable. We'll use Python 2.4.x and Perl 5.8.x in running the scripts.

   ./ poe.txt
   ./ poe.txt

C++ programs will be compiled using gcc (version 4.x.x), though any standard C++ compiler should work. Java programs will use jdk 1.5. PHP will use 5.1.4 in Apache. PHP programs should run as the code that will be called via a POST action and accept a file uploaded via the code/webpage below. The PHP code should print the output in the appropriate format between <PRE> and </PRE> tags so that except for HTML tags the page shown in a browser will be in the format described.

This is the front-end for the PHP code which calls a PHP script named wordfreq.php ( the code you write if you're writing PHP code.) You should view the source for front-end to see how your PHP program will be called.

Your code should do no error checking as to whether the file/path that is passed to the program represents a valid file/path.