Stanford CS240h Lab 2
Please use this skeleton code as your starting point: lab2.tar.gz.
Overview
You will write a parser for git-style INI files, along with a pretty-printer to reformat the parsed data.
NOTE (January 21, 5:00p): The above link has been updated with an older and more clearly worded version of the git-config spec.
INI file syntax
The above link gives a thorough description of the input file syntax. Please note the following clarifications and slight deviations from the above spec:
We do not expect you to implement either the deprecated
[section.subsection]
syntax or support for including other files. (You can implement support for including other files as a bonus; see below.)Your parser should support ASCII input files; UTF-8 is fun but not required.
Variables can be multivalued; you should parse and retain all of the values listed in the input file. We do not require you to retain the exact ordering of values in the input file, but as noted below, your pretty-printer must be idempotent, so the ordering must be deterministic.
Your parser should infer the type of each value (note that multivalued variables can have a different type for each value). In particular,
Bool:
on
,true
, andyes
should all parse asTrue
, whileoff
,false
, andno
should parse asFalse
. As noted in the spec, this should be case insensitive. Two other notes:0
and1
should be parsed as Ints rather than Bools, despite what the spec says.As specified in the
git-config
man page, lines that contain only a key name and no= value
should parse as a boolean with valueTrue
.
Int: Integers should have an optional sign (
+
or-
) followed by one or more base-10 digits. Your parser should accept optional suffixesk
,M
,G
,T
,P
, andE
, indicating that the value should be scaled by 210, 220, etc.Color: You do not need to implement the Color type described in newer versions of the git-config manpage.
String: everything else should be treated as a string. Be careful to follow the specification regarding escaping, internal vs. external whitespace, and quoting!
Parsed representation
In the skeleton code linked above, we have provided several types and signatures that specify the expected type of the parsed data. Please respect this programming interface, or we will not be able to grade your lab!
Pretty-printer
Once you have parsed the input file into an internal representation, the final step is to pretty-print the resulting data using the INI syntax. Be sure that your pretty-printer is idempotent! That is, running the pretty-printed output through your parser and pretty-printer again should give the same result. (Be careful of multivalued variables here!)
(The bytestring
package exports a module called Data.ByteString.Builder
that you might find useful here.)
Commandline interface
Your implementation should take an input file on STDIN and produce pretty-printed output on STDOUT.
Example inputs and outputs
In the skeleton code tarball you will find an examples
directory. These are example inputs and outputs to help you understand some of the edge cases of the INI file spec. The example files come in pairs, e.g., ex1In.txt
and ex1Out.txt
, representing an input and the corresponding pretty-printed output. It is not necessary to match the pretty-printed output exactly; for example, there are many equivalent representations of strings, or your pretty-printer might insert whitespace differently. The purpose of these examples is to help clear up specifics regarding parsing escapes, syntax that must be supported, etc.
Please read the spec and examples carefully!
Clarifications (updated January 22, 10:00a)
The following clarifications are based on Piazza questions. Please continue posting requests for clarification as necessary!
In the below, where we list characters in single quotes, we mean the character that you'd get if you typed exactly that string into ghci (so, e.g., the backslash character is represented below by '\\'
).
Spec oracle
To help you better understand the git-config specification, we've set up a web interface to our implementation that you might find helpful.
Clarified specification
It turns out that recent edits to the git-config manpage and made its Syntax section less useful. We've updated the link above to point to the git-config manpage from version 2.1.4, which is much more precise in its description of the syntax.
Double quotes
Double quoted strings can only appear inside string values and subsection names. Thus, key = "1024"
is a string, while key = 1024
is an integer (and key = "True"
is a string, while key = True
is a boolean).
Any character except backslash ('\\'
) and double quote ('\"'
) can appear inside double quotes and is retained verbatim; backslash and double quote must be escaped, and all valid escape sequences must be accepted (see below for more on escape sequences).
Note that this includes '#'
and ';'
. Inside double quotes, these do not indicate the start of a comment.
Subsection names have special rules. See below for more information.
Unterminated double quotes should result in a parse error.
Subsection names
Subsection names are double quoted and may contain escape sequences. However, newlines of any form (quoted, unquoted, and line continuations) are not allowed inside subsection names, and should result in a parse error.
Escape sequences
There are several escape sequences that your parser should accept inside subsection names and string values. However, the acceptable escape sequences inside subsection names and string values are different (see above for more information on subsection names).
The internal value stored in your INIFile
should be the value that the escape sequence maps to. For example, \\n
maps to \n
, so the latter should be stored in you INIFile
.
An escaped whitespace character (e.g., tab, newline, and backspace) in a string value is never considered leading or trailing whitespace. So after parsing key = \\tasdf\\t
the corresponding value should be \tasdf\t
.
Invalid escape sequences should result in a parse error. This includes backslash followed by end-of-file.
Line continuations
A particularly confusing escape sequence seems to be line continuations. There are several important points with regard to line continuations:
Line continuations can only appear in a value. A value begins after the
'='
on a line likekey = value
; thus, thekey
syntax (indicating aTrue
bool) does not admit line continuations, nor can line continuations appear in comments, in section declarations, etc.A line continuation is a special case of an escape sequence. The literal string
\\\n
, i.e., backslash followed by newline (ASCII0x5C0A
) is an escape sequence that just means the empty string.Line continuations result in an empty string, and leading and trailing whitespace in values is removed. Taken together, this means that if leading or trailing whitespace is broken by an escaped newline, any whitespace after the escaped newline is still considered leading or trailing whitespace. For an example of this, see
ex7In.txt
andex7Out.txt
.
Pretty printing
When pretty printing, the internal representation of your INIFile
needs to be converted to a format that can be parsed again. This means that special characters in string values must appear in double quotes and/or be properly escaped.
You do not need to replicate line continuations in your pretty printed output.
OverloadedStrings
Feel free to use the OverloadedStrings
language feature.
Bonus!
As a bonus, you can implement support for including other files, as described in the git-config man page.
Even if you decide not to implement this, we encourage you to think about how you would implement it. This is a nice example of the kinds of design questions that often arise in functional programming: on the one hand, we would like our parser to be pure, while on the other hand file inclusion requires IO.
Other information
Please ask (early!) any qualifying questions about the specification on Piazza.
You can use attoparsec and bytestring for this lab (or you can roll your own parser if you'd like). For testing, you should use hspec and/or QuickCheck.
A useful resource for learning more about parser combinator libraries like attoparsec is chapter 16 of Real World Haskell, Using Parsec. (This chapter covers Parsec, which is similar but not identical to attoparsec. You will nevertheless find it a useful companion to the attoparsec documentation.)
Due Date
Lab 2 should be submitted by the start of class on January 27.
You have 48 hours of late days for the three labs. They are consumed in 24 hour blocks and are used automatically. After they are used, you'll have the maximum grade you can receive for a late lab reduced by 25% each day.
Stack -- Build & Test Tool
We are using the stack build tool for this course. Once getting the skeleton code, you should be able to run:
stack setup
stack build
stack test
stack exec parseini-exe
We have provided an overview of Stack here.
Provided Files
The files provided to get started are:
parseini.cabal, stack.yaml -- specifies the build system.
src/ParseIni.hs -- implement your pure parser here
src/PrettyPrintIni.hs -- implement your pretty-printer here
app/Main.hs -- implement the command line interface to your pure functions here (argument parsing, stdin/stdout handling).
test/Spec.hs -- the test harness. You need to edit this and add your own tests! We provide a few very simple ones.
PLEASE DON'T CHANGE THE INTERFACE OF THE ParseIni.hs
OR PrettyPrint.hs
MODULES, AS WE WILL EXPECT IT TO BE THE SAME WHEN TESTING! (Do not add any extra source files, either.)
Testing Lab 2
Some skeleton code for a test framework is provided in test/Spec.hs
. You'll need to edit it to add your own tests. The test framework uses a Haskell package called hspec. Please refer to it for documentation on how to use it.
Grading
We strongly encourage you to take testing seriously and write a comprehensive test suite. Grading will be only on functionality, but we will try to give feedback on your coding style.
Submitting
First, simply type:
stack sdist
This will generate a tar file of your code. Please don't add any extra source files without changing the tr.cabal
file correspondingly. Otherwise, your submission will be broken and missing files.
Then go to upload.ghc.io and submit your work through the online form. You can resubmit as many times as you want up until the deadline.
If you have any trouble submitting on-line, then please post on Piazza or email the staff mailing .
Suggested Music
We suggest you listen to The Flaming Lips - The Soft Bulletin if you need music while programming.