Parsing

Intro

Hi!

Office hours:

Let’s talk about parsing

A whole lot of programming involves interacting with external sources of data:

Our sad duty is to try to make sense of this stuff.

O hai, HTTP 1.1

The very first lines of BNF from RFC 2616:

Request = Request-Line              ; Section 5.1
          *(( general-header        ; Section 4.5
           | request-header         ; Section 5.3
           | entity-header ) CRLF)  ; Section 7.1
          CRLF
          [ message-body ]          ; Section 4.3

(Never read an RFC? See RFC 2616 section 2 for a quick tour of IETF BNF syntax.)

Parsing by hand: C++ parser fragment

Long ago, the ancients wrote all of their parsers by hand, because that was all they could do.

void Response::ProcessStatusLine( std::string const& line )
{
    const char* p = line.c_str();

    while( *p && *p == ' ' )
        ++p;

    while( *p && *p != ' ' )
        m_VersionString += *p++;
    while( *p && *p == ' ' )
        ++p;

    std::string status;
    while( *p && *p != ' ' )
        status += *p++;
    while( *p && *p == ' ' )
        ++p;

    while( *p )
        m_Reason += *p++;

    m_Status = atoi( status.c_str() );
    if( m_Status < 100 || m_Status > 999 )
        throw Wobbly( "BadStatusLine (%s)", line.c_str() );

    if( m_VersionString == "HTTP:/1.0" )
        m_Version = 10;
    else if( 0==m_VersionString.compare( 0,7,"HTTP/1." ) )
        m_Version = 11;
    else
        throw Wobbly( "UnknownProtocol (%s)", m_VersionString.c_str() );

    m_State = HEADERS;
    m_HeaderAccum.clear();
}

Parsing by hand: Java parser fragment, page 1

public class HttpResponseParser extends AbstractMessageParser {

    private final HttpResponseFactory responseFactory;
    private final CharArrayBuffer lineBuf;

    public HttpResponseParser(
            final SessionInputBuffer buffer,
            final LineParser parser,
            final HttpResponseFactory responseFactory,
            final HttpParams params) {
        super(buffer, parser, params);
        if (responseFactory == null) {
            throw new IllegalArgumentException("Response factory may not be null");
        }
        this.responseFactory = responseFactory;
        this.lineBuf = new CharArrayBuffer(128);
    }

Parsing by hand: Java parser fragment, page 2

    protected HttpMessage parseHead(
            final SessionInputBuffer sessionBuffer)
        throws IOException, HttpException, ParseException {

        this.lineBuf.clear();
        int i = sessionBuffer.readLine(this.lineBuf);
        if (i == -1) {
            throw new NoHttpResponseException("The target server failed to respond");
        }
        //create the status line from the status string
        ParserCursor cursor = new ParserCursor(0, this.lineBuf.length());
        StatusLine statusline = lineParser.parseStatusLine(this.lineBuf, cursor);
        return this.responseFactory.newHttpResponse(statusline, null);
    }
}

Commentary

Hand-written parsers can be quite efficient.

But…

Commentary

Hand-written parsers can be quite efficient.

But…