From d0e9b780211f8c35f97a9527e93c3208cddefac4 Mon Sep 17 00:00:00 2001 From: clsr Date: Fri, 18 Aug 2017 14:29:38 +0200 Subject: CNP 0.3 specification --- cnp-specification.cnm | 495 ++++++++++++++++++++++++++++++-------------------- 1 file changed, 301 insertions(+), 194 deletions(-) diff --git a/cnp-specification.cnm b/cnp-specification.cnm index fffba6e..139abfc 100644 --- a/cnp-specification.cnm +++ b/cnp-specification.cnm @@ -1,197 +1,304 @@ title - CNP 0.1 + ContNet Protocol specification, version 0.3 (2017-08-18) content - text fmt - This is an archived copy of the original draft of the ContNet Protocol from 2013. It has never been implemented and contains numerous flaws. - - raw text/markdown - # The CN protocol - The protocol simple to generate and parse by programs and to write and read by humans, even in its raw form. - - # Request - A request is composed of the request header and the request body. - - The request header consists of the requested path followed by any number of single-space separated key=value parameters that ends in a newline. - - The body can contain any amount arbitrary data, including none. If a body is present, its length in bytes encoded as an ASCII decimal number must be given in the "length" parameter. - - The path is a standard UNIX-style absolute filepath prefixed with the host address (example.com/path/to/file.txt). The filepath part is separated from the host with the first / in the path ("example.com/" is a valid path, "[1fff:8:a88:85a3::ac1f]:8001/foo/bar" is valid, "example.com" isn't). The parameter key and value can be any bytestring. Blank keys and values are permissible. An empty request header is invalid, at least the path must be given. - - Certain parameter keys are reserved for specific functionality: - - - length: the length of the body data in bytes; required if any data will be sent - - version: a dot-separated 2-tuple of the client version; if the version sent is x.y, the server may reply with response version x.<=y (the minor version can be any in the current major version up to the request version); if the version isn't sent, the server may reply with any version (such as the latest) under the assumption that the client is the user directly writing raw requests (so it can, for example, omit unnecessary data from the response, like the time-related headers) - - client: a string identifying the user agent; optional - - compression: the compression format of the request body, such as "gzip"; optional - - Request format: - - example.com/request/path.ext param1=value1 another\_param=some\-value\0\n third-param= =without_name length=43 - data data data data - data data data data - ... - - Interpretation: - - { - path: { - host: "example.com" - filepath: "/request/path.ext", - }, - params: { - "param1": "value1", - "another param": "some=value\x00\x0a', - "third-param": "", - "": "without_name", - }, - body: "data data data data\ndata data data data\n...", - } - - Examples: - - example.com/ version=0.0.1 client=get/0.1 - - example.org:8080/post-reply version=0.0.1 length=3 - Hi! - - ### response - A response is also composed of a header and the body. - - Instead of the request path, a response has the response type as the first token. This can be one of the following: - - - content: response with the requested content in the standard document format. - - data: response with arbitrary data - - redirect: redirects the client to a different URI provided in the body; equivalent of HTTP 301, 302, 303, 307 and 308 - - error: indicating an error with the error type in the "error" parameter and possibly with a page in the body; equivalent of HTTP 4xx and 5xx - - The parameters function the same as in request. Predefined parameters: - - - length: the length of the body; optional since the body ends when the connection is closed, but recommended; any response type - - version: see request, except that the server must always send this; any response type - - server: a string identifying the server program; optional; any response type - - name: the name of the current file/page; optional; content or data response - - time: a RFC3339 timestamp, preferably in the UTC timezone, representing the current time; any response type - - modified: a RFC3339 timestamp, preferably in the UTC timezone, representing the last modification time of the requested resource; content or data response - - type: the MIME type of the resource; data response - - compression: the compression format of the response body, such as "gzip" and "none"; content or data response - - error: a string identifying the error on an error response; error response type only: - - "syntax": invalid request header; equivalent of HTTP 400 - - "invalid": a parameter was rejected (usually for having an invalid value) - - "denied": server does not want to provide this content; equivalent of HTTP 401 and 403 - - "not found": the requested path (either host or the filepath) does not exist on the server; equivalent of HTTP 404 - - "too large": the server doesn't want to accept so much data (either header or body); equivalent of HTTP 413 - - "server error": internal server error; equivalent of HTTP 500 - - "not supported": requested feature isn't supported; equivalent of HTTP 406, 501 and 505 - - The response body can contain one of two things: - - - The content in the [CN Content document format](/cnm/CNM 0.1 specification.cnc); response type "content" - - Arbitrary data/files (images, videos, etc.); response type "data" - - Out of these, the former will be displayed and the latter might be embedded in the page if the client supports that, otherwise presented as a downloadable file. - - Response format: - - ok param1=value1 param2=value2 - datadatadatadata - - Interpretation: - - { - type: "ok", - params: { - "param1": "value1", - "param2": "value2", - }, - body: "datadatadatadata", - } - - Examples: - - ok version=0.0.1 server=cnd/0.1 - Hello! - - error version=0.0.1 error=not\_found length=23 - The file was not found. - - redirect version=0.0.1 - /some/page - - ### escaping - The ASCII newline (0x0a), space (0x20), equals (0x3d), NUL (0x00) and backslash (0x5c) must be escaped in several contexts, such as the request/response headers and paths. - - Escaping the equals sign is optional in paths, but can be done. - - The tab (0x09) and carriage return (0x0d) characters do not have to be escaped, despite technically being whitespace. - - - newline: \\n - - space: \\\_ - - equals: \\- - - NUL: \\0 - - backslash: \\\\ - - ## The default content format - TODO - - This should be a human readable, machine parsable and simple. The content should be obvious from the raw response. No layout, just content. - - List of site links, a site tree and content? - - Optional next/previous page, up? - - Embed images or not? - - Tag different parts of the file (like RSS or LaTeX?), such as links, related pages, sections, etc. - - See gopher and markdown. - - The current draft of the format can be seen [here](/cnm/CNM 0.1 specification.cnc). - - ## Comparison with HTTP - ### cookies - TODO - - Instead of cookies, CNP will have sessions. These are parameters bound to exact hosts and contain only a value (TODO: limit value size? exactly N bytes?). - - To start a session, a response of the type "content" will have a "session" parameter. The client can (and possibly should) ask the user whether to accept the session. If the user had a session before, it is replaced with the new session. Setting an empty session key will end the session (the client should stop sending the session parameter). On every request while the session is active, the user will send a "session" parameter containing the session key. A session key is bound to the exact host (as specified in the host part of the request path). There is no session expiry or setting cross-host sessions. The client should show when a session is active (for example, display an indicator in a part of the UI) and let user end it at any time without much trouble. The server should not rely on the client accepting the session. - - ### POST - TODO - - Basically, not required, as there are parameters and the request body. See cookies (esp. challenge/response part). - - ### HEAD - Unnecessary. The client can make a normal request, read the header and then close the connection without reading the body. - - ### forms - TODO - - Prepend : (or another symbol) to form key parameter names when sending the request? - - ### REST and other APIs - Unnecessary. The protocol itself is an API. Information can be easily extracted from the default content format. - - ### range - TODO - - Should be implemented. Possibly just the "range" header specifying a byte range "4-5", "-120", "8-", etc. (: is a possible alternate delimiter). - - ### if-none-match, if-modified-since - TODO - - There are the "date" and "modified" params at the moment. One option is to let the client handle it: check if the date of the cached resource <= the modtime of what server sends and close the connection without reading the body if it's unchanged. - - A "hash" param containing some hash of the content could be added. Which hash was used would not be important, as the value of the parameter would be compared to the hash parameter in the new response. - - ### keep-alive - Considered unnecessary. HTTP keep-alive requires either the content-length to be known in advance (meaning that data can't be generated while the request is being sent) or stuff like chunked encoding (CNP body data is raw data, no CNP-specific decoding necessary). The modern internet connections are fast enough that a TCP connection doesn't take too long. Keep-alive also has [other problems](https://en.wikipedia.org/wiki/HTTP_persistent_connection#Disadvantages). - - ### being bloated - Not implemented. - - ## Possible changes - - Let request/response header fields be separated by arbitrary number of spaces instead of exactly one (less strict, but would require splitting on fields instead of on characters) - - Forbid NUL character in header altogether (better C string support, probably not) - - Permit key-only parameters (no =, but would require careful splitting) - - TODO + section Overview + text + CNP is a request-response application protocol meant to facilitate hypertext content requests and delivery over the Internet. It is designed to be an alternative to a part of HTTP; specifically, the subset dealing with static content (text, images, videos and similar). The protocol is designed to be simple, easy to implement and unambiguous, as well as strict in order to avoid having to support multiple incorrect implementations. + + It is a binary protocol. While all of the defined keywords are also valid ASCII text (lowercase letters, numbers, symbols and whitespace), CNP does not use any character encoding. Rather, these keywords should be treated as specific bytestrings. Every mention of CNP strings in the specification refers to bytestrings whose content is the provided ASCII-encoded text. This is done to avoid problems with encoding and case insensitivity and simplify parsing, since each keyword has exactly one valid bytestring representation. + + For now, the protocol implements only a limited amount of simple optimizations. While it would be possible to make it slightly faster by including more, that would also increase its complexity and increase the likelihood of implementations supporting a very small or even incorrect subset of the specification. Additional simple optimizations may be added in future versions of CNP. + + + section CNP message + text + CNP messages are composed of a header and an optional body. The header is delimited by a newline. Both request and response share the same syntax, with the differences being the semantic meaning of values. + + + section Header + text fmt + The header consists of the CNP version, message intent and an arbitrary number of parameters, using a space as the delimiter between fields. + + The header field delimiter is a single space byte (``0x20``). Multiple successive spaces in the header are a syntax error. A space cannot appear as itself in any header value; it must be escaped into a specific byte sequence. This makes splitting a valid header by spaces safe and correct. + + The CNP version part of the header is the string ``"cnp/"`` followed by a tuple of the major and minor version numbers without leading zeroes on non-zero versions, separated by a period (e.g. ``"cnp/0.3"``). Even if the specification and implementation use a patch version, it is not provided in the message header and must have no impact on protocol compatibility. CNP implementations within the same major version should be generally compatible with other minor versions for basic requests, excluding some noncritical feature discrepancy; unknown parameters are ignored. An exception to this are all versions in the major version ``0``; since it's the development version, the protocol may be completely rewritten between two minor versions. + + The intent is a string that defines the type of the message. Valid values depend on whether the message is request or response. + + Parameters are key-value pairs separated by an equals sign (``0x3d``). There may be any number of parameters in a message. The key and value can be arbitrary bytestrings, including empty strings. The order of parameters can be arbitrary and implementations must not depend on it nor alter their behavior based on the order; sorting the parameters or shuffling their order produces an identical request. Each parameter key may appear at most once in a message; duplicate parameter key is an error. The equals sign must always be present for a parameter, even if the key or value is blank. A missing parameter is considered to have the blank value, so there is no difference between a parameter provided with a blank value and one not present in the header. + + Header ends with a line feed byte (``0x0a``). Note that if there is a carriage return before it, it will be considered to be a part of the last token (parameter value or intent). + + The message body is a blob of arbitrary binary data. There is no body delimiter or terminator; if the body ends with a newline, it is considered a part of the body data. + + + section Escaping + text fmt + The NUL (``0x00``), line feed (``0x0a``), space (``0x20``), equals sign (``0x3d``) and backslash (``0x5c``) bytes must be escaped by using a backslash and a specific character (see table below) in all parts of the message header, except when used to delimit header fields or parameter key and value. Using any of these characters when not otherwise specified by the syntax (such as space between intent parameters, equals sign between parameter key and value and line feed at the end of the header) is a syntax error. Other characters, such as tab and carriage return, do not have to be escaped and stand for themselves. + + Each raw value can be escaped into exactly one escaped value and each escaped value maps 1:1 to raw values. Because of this, escaped values do not have to be unescaped or normalized before being handled, as long as they are compared with escaped versions of expected values. + + The message body contains no escaping and is plain binary data (unless specified otherwise by a parameter). + + raw + (NUL) -> \0 + (LF) -> \n + (space) -> \_ + = -> \- + \ -> \\ + + + section Request + text + A CNP request is sent by a client that wants to request a resource from a server (or send some data to it). + + + section Intent + text fmt + In a request, the intent part of the header contains the hostname of the server concatenated with the path of the requested resource. + + The hostname may be a domain name or an IP address, optionally with a port number, and it may not contain any slash (``0x2f``) bytes. In general, it should be the address that the client sends the request to. The format of the hostname is the same as in URLs. + + The path is a Unix-style absolute filepath with an optional trailing slash. It may (and probably should) be cleaned up by CNP implementations before being processed by collapsing multiple consecutive slashes into single ones, removing path entries that consist of a single dot and resolving double-dot parent directories, while leaving a trailing slash if one was present (for example, the path ``"/../../.\/\/\/foo/bar/.."`` becomes ``"/foo"`` and ``"\/\/foo/bar/../"`` becomes ``"/foo/"``). If possible, implementations should avoid sending filepaths that need to be cleaned. The minimal path is ``"/"``; blank path is an error. + + The hostname is case-insensitive, while the path is case-sensitive. Hostname may be normalized by clients and servers, but a specific path (after cleanup) represents one specific resource. + + To separate the hostname from the filepath, split the intent before the first slash. + + + section Parameters + text + The following request parameters are defined (default value is assumed if the parameter value is blank; any absent parameter must be treated as if it had the default value): + + + section length + raw + length={number} + + text fmt + The length of the request body data in bytes. + + This field is required if any data will be sent to the server. + + Default value: ``"0"`` (empty body) + + + section name + raw + name={identifier} + + text fmt + The name of the content being sent in the request body. Must not contain slashes (``0x2f``) or NUL bytes (``0x00``), including escaped NUL bytes (``\0``). + + Can be used to provide filename metadata. If the original filename was a Unicode filename instead of a bytestring, UTF-8 should be used to encode it. + + Default value: ``""`` (no name provided) + + + section type + raw + type={identifier}/{identifier} + + text fmt + MIME type of the content being sent in the request body. + + Default value: ``"application/octet-stream"`` + + + section if_modified: + raw + if_modified={timestamp} + + text fmt + Only send the resource in the response if it has been modified since the RFC 3339 UTC timestamp provided, according to the server's time. Otherwise, reply with a ``"not_modified"`` response and no body data. + + The RFC 3339 timestamp format should be the following, as a strftime format string format: ``%Y-%m-%dT%H:%M:%SZ``, for example: ``1970-12-31T23:59:30Z``. + + The timestamp should usually be either the server's ``"modified"`` parameter value from the time the cached resource was requested or the ``"time"`` parameter value from that request if the former was not provided. If neither was present in the previous response for the current request's intent, the client should not use an ``if_modified`` parameter. + + Default value: ``""`` (do not perform this check, just send the resource) + + + section Response + text + CNP requests are answered by a CNP response by the server. Most of the time, the response will probably contain the requested resource. + + + section Intent + text + The intent part of the response header is one of the defined response types. It signifies whether the response contains the resource requested, a redirect, report of an error or other results. + + + section ok + text + The request was successfully resolved and the response body contains the requested resource. + + + section not_modified + text fmt + The resource has not been modified since the time in the request's ``"if_modified"`` parameter. The response body is blank and the client should use the cached resource. + + + section redirect + text fmt + The client should make a new request to the location provided in the ``"location"`` parameter. The response body may contain a page as if it was an ``"ok"`` response. + + The new request should be blank (contain no body data, as if that path was just entered anew by the user). + + Clients are not required to follow the redirect. Servers should not assume that a client will always follow the redirect immediately or at all. Interactive user agents may prompt the user for confirmation before opening the new page. Otherwise, following redirects up to a certain count is generally the expected behavior. + + User agents do not have to display the provided page and may directly perform the redirect. If the redirect is not performed and a page was provided, it should be displayed instead. + + + section error + text fmt + There was an error answering the request. The server must also provide the ``"reason"`` response parameter. The body data may contain a page as if it was an ``"ok"`` response. + + User agents do not have to display the provided page and may just inform the user of the error reason. If a page is provided, it should be used to inform the user of the details of the error. + + + section Parameters + text + The following response parameters are defined (default value is assumed if the parameter value is blank; any absent parameter must be treated as if it had the default value): + + + section length + raw + length={number} + + text fmt + The length of the response body data in bytes. + + If this parameter is not provided or is blank, the client should read all data until the connection is closed, end of file is reached or equivalent. Despite that, the length parameter should be sent whenever possible. + + Valid in all response types. + + Default value: ``""`` (read until EOF) + + + section name + raw + name={identifier} + + text fmt + The name of the content being sent in the response body. Must not contain slashes (``0x2f``) or NUL bytes (``0x00``), including escaped NUL bytes (``\0``). + + Can be used to provide filename metadata. If the original filename was a Unicode filename instead of a bytestring, UTF-8 should be used to encode it. + + Valid in response types where the body data can be file (``ok``, ``redirect``, ``error``). + + Default value: ``""`` (no name provided) + + + section type + raw + type={identifier}/{identifier} + + text fmt + MIME type of the content being sent in the response body. + + Valid in response types where the body data can be file (``ok``, ``redirect``, ``error``). + + Default value: ``"application/octet-stream"`` + + + section time + raw + time={timestamp} + + text fmt + The current time on the server as an RFC 3339 UTC timestamp. + + May be sent with any response type, but most useful in the ``"ok"`` response, where it may be used by the client in an ``"if_modified"`` request parameter later. + + Default value: ``""`` (no timestamp provided) + + + section modified + raw + modified={timestamp} + + text fmt + RFC3339 UTC timestamp representing the time the requested file was last modified. + + Valid in ``"ok"`` and ``"not_modified"`` responses, where it may be used by the client in an ``"if_modified"`` request parameter later. + + Default value: ``""`` (no timestamp provided) + + + section location + raw + location={identifier}/{identifier} + + text fmt + The location to redirect to in the format of a CNP request intent. + + If the host part of the intent in the value is empty, the current host should be reused. If the host is ``"."``, the current host should be reused and the path should be appended to the current path, excluding the last filename after the final slash in the current path (if any). Otherwise, the new request should be sent to the provided host, which has to be resolved to a server address (instead of just sending a request with the new host to the current server). + + For example, if the request was sent to ``"example.com/foo/bar"`` and the redirect location was ``"/baz"``, the new request's intent is ``"example.com/baz"``, but if the location parameter was ``"./baz"``, the new intent is ``"example.com/foo/baz"``. + + Valid in ``"redirect"`` response type. + + Default value: ``""`` (not providing this parameter in a redirect response is an error, no redirect happens) + + + section reason + raw + reason={identifier} + + text + Describes the reason for the error. + + Defined values: + + list + text fmt + ``syntax``: the request was not a valid CNP message + + text fmt + ``version``: the request CNP version is not supported by the server + + text fmt + ``invalid``: the received CNP message is not a valid CNP request (invalid intent format, invalid value of a defined parameter) + + text fmt + ``not_supported``: a requested feature is not supported by the server, so the request cannot be answered + + text fmt + ``too_large``: the server does not want to accept so much data (either header or body) + + text fmt + ``not_found``: the requested path was not found on the server + + text fmt + ``denied``: server does not allow access to this content (might require authentication first) + + text fmt + ``rejected``: the request did not match the server's requirements for that path, but was a valid CNP request (e.g. missing parameters the application requires or an API call provided incorrect type of data) + + text fmt + ``server_error``: internal server error + + text fmt + Valid in ``"error"`` response type. + + Default value: ``"server_error"`` (servers should provide this parameter in the ``"reason"`` response if possible) + + + section EBNF + raw ebnf + raw_character = ? any byte except "\0", "\n", " ", "=", "\\" ? ; + escaped_character = "\\" ( "0" | "n" | "_" | "-" | "\\" ) ; + ident_character = raw_character | escaped_character ; + identifier = ident_character, { ident_character } ; + number = "0" | ( "1" … "9" ), { "0" … "9" } ; + + version = "cnp/", number, ".", number ; + intent = identifier ; + parameter = [ identifier ], "=", [ identifier ] ; + + header = version, " ", intent, { " ", parameter }, "\n" ; + body = { ? any byte ? } ; + + message = header, body ; -- cgit