summaryrefslogtreecommitdiffstats
path: root/cnp-specification.cnm
blob: 28f473b3e0b0f7ae177e824f60bf8133bc27cd44 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
title
	ContNet Protocol specification, version 0.4 (2017-09-04)

content
	section Overview
		text
			CNP is a request-response application protocol meant to facilitate hypertext content requests and delivery over the Internet. It is designed to be an alternative to a part of HTTP; specifically, the subset dealing with static content (text, images, videos and similar). The protocol is designed to be simple, easy to implement and unambiguous, as well as strict in order to avoid having to support multiple incorrect implementations.

			It is a binary protocol. While all of the defined keywords are also valid ASCII text (lowercase letters, numbers, symbols and whitespace), CNP does not use any character encoding. Rather, these keywords should be treated as specific bytestrings. Every mention of CNP strings in the specification refers to bytestrings whose content is the provided ASCII-encoded text. This is done to avoid problems with encoding and case insensitivity and simplify parsing, since each keyword has exactly one valid bytestring representation.

			For now, the protocol implements only a limited amount of simple optimizations. While it would be possible to make it slightly faster by including more, that would also increase its complexity and increase the likelihood of implementations supporting a very small or even incorrect subset of the specification. Additional simple optimizations may be added in future versions of CNP.


	section CNP message
		text
			CNP messages are composed of a header and an optional body. The header is delimited by a newline. Both request and response share the same syntax, with the differences being the semantic meaning of values.


		section Header
			text fmt
				The header consists of the CNP version, message intent and an arbitrary number of parameters, using a space as the delimiter between fields.

				The header field delimiter is a single space byte (``0x20``). Multiple successive spaces in the header are a syntax error. A space cannot appear as itself in any header value; it must be escaped into a specific byte sequence. This makes splitting a valid header by spaces safe and correct.

				The CNP version part of the header is the string ``cnp/`` followed by a tuple of the major and minor version numbers without leading zeroes on non-zero versions, separated by a period (e.g. ``cnp/0.4``). Even if the specification and implementation use a patch version, it is not provided in the message header and must have no impact on protocol compatibility. CNP implementations within the same major version should be generally compatible with other minor versions for basic requests, excluding some noncritical feature discrepancy; unknown parameters are ignored. An exception to this are all versions in the major version ``0``; since it's the development version, the protocol may be completely rewritten between two minor versions.

				The intent is a string that defines the type of the message. Valid values depend on whether the message is request or response.

				Parameters are key-value pairs separated by an equals sign (``0x3d``). There may be any number of parameters in a message. The key and value can be arbitrary bytestrings, including empty strings. The order of parameters can be arbitrary and implementations must not depend on it nor alter their behavior based on the order; sorting the parameters or shuffling their order produces an identical request. Each parameter key may appear at most once in a message; duplicate parameter key is an error. The equals sign must always be present for a parameter, even if the key or value is blank. A missing parameter is considered to have the blank value, so there is no difference between a parameter provided with a blank value and one not present in the header.

				Header ends with a line feed byte (``0x0a``). Note that if there is a carriage return before it, it will be considered to be a part of the last token (parameter value or intent).

				The message body is a blob of arbitrary binary data. There is no body delimiter or terminator; if the body ends with a newline, it is considered a part of the body data.

				Example: ``cnp/0.4 msg_intent param1=value1 param2=value2``


		section Escaping
			text fmt
				The NUL (``0x00``), line feed (``0x0a``), space (``0x20``), equals sign (``0x3d``) and backslash (``0x5c``) bytes must be escaped by using a backslash and a specific character (see table below) in all parts of the message header, except when used to delimit header fields or parameter key and value. Using any of these characters when not otherwise specified by the syntax (such as space between intent parameters, equals sign between parameter key and value and line feed at the end of the header) is a syntax error. Other characters, such as tab and carriage return, do not have to be escaped and stand for themselves.

				Each raw value can be escaped into exactly one escaped value and each escaped value maps 1:1 to raw values. Because of this, escaped values do not have to be unescaped or normalized before being handled, as long as they are compared with escaped versions of expected values.

				The message body contains no escaping and is plain binary data (unless specified otherwise by a parameter).

			raw
				(NUL)   ->  \0
				(LF)    ->  \n
				(space) ->  \_
				=       ->  \-
				\       ->  \\


		section EBNF
			raw ebnf
				raw_character = ? any byte except "\0", "\n", " ", "=", "\\" ? ;
				escaped_character = "\\", ( "0" | "n" | "_" | "-" | "\\" ) ;
				ident_character = raw_character | escaped_character ;
				identifier = ident_character, { ident_character } ;
				number = "0" | ( "1" … "9" ), { "0" … "9" } ;

				version = "cnp/", number, ".", number ;
				intent = identifier ;
				parameter = [ identifier ], "=", [ identifier ] ;

				header = version, " ", intent, { " ", parameter }, "\n" ;
				body = { ? any byte ? } ;

				message = header, body ;


	section Request
		text
			A CNP request is sent by a client that wants to request a resource from a server (or send some data to it).


		section Intent
			text fmt
				In a request, the intent part of the header contains the hostname of the server concatenated with the path of the requested resource.

				The hostname may be a domain name or an IP address, optionally with a port number, and it may not contain any slash (``0x2f``) bytes. In general, it should be the address that the client sends the request to. The format of the hostname is the same as in URLs.

				The path is a Unix-style absolute filepath with an optional trailing slash. It may (and probably should) be cleaned up by CNP implementations before being processed by collapsing multiple consecutive slashes into single ones, removing path entries that consist of a single dot and resolving double-dot parent directories, while leaving a trailing slash if one was present (for example, the path ``/../.././//foo/bar/..`` becomes ``/foo`` and ``//foo/bar/../`` becomes ``/foo/``). If possible, implementations should avoid sending filepaths that need to be cleaned. The minimal path is ``/``; blank path is an error.

				The hostname is case-insensitive, while the path is case-sensitive. Hostname may be normalized by clients and servers, but a specific path (after cleanup) represents one specific resource.

				To separate the hostname from the filepath, split the intent before the first slash.


		section Parameters
			text
				The following request parameters are defined (default value is assumed if the parameter value is blank; any absent parameter must be treated as if it had the default value):


			section length
				raw
					length={number}

				text fmt
					The length of the request body data in bytes.

					This field is required if any data will be sent to the server. If it's absent or zero, the request must not contain any body data. Otherwise, the byte size of the request body data must be provided in this parameter.

					Default value: ``0`` (empty body)


			section name
				raw
					name={identifier}

				text fmt
					The name of the content being sent in the request body. Must not contain slashes (``0x2f``) or NUL bytes (``0x00``), including escaped NUL bytes (``\0``).

					Can be used to provide filename metadata. If the original filename was a Unicode filename instead of a bytestring, UTF-8 should be used to encode it.

					Default value: empty (no name provided)


			section type
				raw
					type={identifier}/{identifier}

				text fmt
					MIME type of the content being sent in the request body.

					Default value: ``application/octet-stream``


			section if_modified
				raw
					if_modified={timestamp}

				text fmt
					Only send the resource in the response if it has been modified since the RFC 3339 UTC timestamp provided, according to the server's time. Otherwise, reply with a ``not_modified`` response and no body data.

					The RFC 3339 timestamp format should be the following, as a strftime format string format: ``%Y-%m-%dT%H:%M:%SZ``, for example: ``1970-12-31T23:59:30Z``.

					The timestamp should usually be either the server's ``modified`` parameter value from the time the cached resource was requested or the ``time`` parameter value from that request if the former was not provided. If neither was present in the previous response for the current request's intent, the client should not use an ``if_modified`` parameter.

					An ``if_modified`` parameter may only be set on a request for a cached page when all of the host, path and selector are equivalent to the previously cached copy.

					Default value: empty (do not perform this check, just send the resource)


			section select
				raw
					select={identifier}:{identifier}
					select={identifier}:

				text fmt
					Request the server to apply a filter on the contents of the requested path before sending it.

					The parameter value consists of a selector name (that may not contain a colon), a colon byte (``0x3A``) and an optional selector query. The functionality of the selector depends on which selector is requested.

					The following selectors are defined in this specification:

				section byte
					raw
						select=byte:{number}-{number}
						select=byte:{number}-
						select=byte:-{number}
						select=byte:-

					text fmt
						The ``byte`` selector selects a subset of the content bytes. The ``{from}`` value represents the index of the first included byte; it defaults to ``0``, the first byte in the contents, when absent. The ``{to}`` represents the index of the last included byte; it defaults to the last byte in the contents when absent. An end byte lesser than the start byte is invalid in a request. If the start byte index is greater than the content length, the response contents should be empty (``length=0``).

						If possible, implementations should support this selector.

				section info
					raw
						select=info:

					text fmt
						The ``info`` selector selects the response header of a request to this path. It takes no query string; if one is provided, the selector is invalid.

						The response body (**not** header) should contain the CNP header line that would have been used to answer this request if it didn't contain the ``info`` selector. It should contain all parameters that an actual response header would, especially a potential ``length`` parameter.

						If possible, implementations should support this selector.

				section cnm
					raw
						select=cnm:{identifier}

					text fmt
						The ``cnm`` selector selects content based on CNM content selector. See the CNM specification for more information.

				text fmt
					Implementations may choose to support custom selectors. More selectors may be defined in the future.

					If an unknown selector is requested, it should be ignored and the request responded to as if there was no selector.

					If a known selector is requested for an unsupported path, the response should be an ``error`` with ``reason=not_supported``.

					If the selector query was invalid, the response should be an ``error`` with ``reason=invalid``.


	section Response
		text
			CNP requests are answered by a CNP response by the server. Most of the time, the response will probably contain the requested resource.


		section Intent
			text
				The intent part of the response header is one of the defined response types. It signifies whether the response contains the resource requested, a redirect, report of an error or other results.


			section ok
				text
					The request was successfully resolved and the response body contains the requested resource.


			section not_modified
				text fmt
					The resource has not been modified since the time in the request's ``if_modified`` parameter. The response body is blank and the client should use the cached resource.


			section redirect
				text fmt
					The client should make a new request to the location provided in the ``location`` parameter. The response body may contain a page as if it was an ``ok`` response.

					The new request should be blank (contain no body data, as if that path was just entered anew by the user).

					Clients are not required to follow the redirect. Servers should not assume that a client will always follow the redirect immediately or at all. Interactive user agents may prompt the user for confirmation before opening the new page. Otherwise, following redirects up to a certain count is generally the expected behavior.

					User agents do not have to display the provided page and may directly perform the redirect. If the redirect is not performed and a page was provided, it should be displayed instead.


			section error
				text fmt
					There was an error answering the request. The server must also provide the ``reason`` response parameter. The body data may contain a page as if it was an ``ok`` response.

					User agents do not have to display the provided page and may just inform the user of the error reason. If a page is provided, it should be used to inform the user of the details of the error.


		section Parameters
			text
				The following response parameters are defined (default value is assumed if the parameter value is blank; any absent parameter must be treated as if it had the default value):


			section length
				raw
					length={number}

				text fmt
					The length of the response body data in bytes.

					If present, it must contain the length of the response body in bytes. The client should use that to delimit reading the response.

					If this parameter is not provided or is blank, the client should read all data until the connection is closed, end of file is reached or equivalent. Despite that, the length parameter should be sent whenever possible. If the transport does not support an equivalent to an end-of-message signal, then the ``length`` parameter is required for responses.

					A zero response ``length``, which is not the default, means that the response has no body data.

					Valid in all response types.

					Default value: empty (read until EOF)


			section name
				raw
					name={identifier}

				text fmt
					The name of the content being sent in the response body. Must not contain slashes (``0x2f``) or NUL bytes (``0x00``), including escaped NUL bytes (``\0``).

					Can be used to provide filename metadata. If the original filename was a Unicode filename instead of a bytestring, UTF-8 should be used to encode it.

					Valid in response types where the body data can be file (``ok``, ``redirect``, ``error``).

					Default value: empty (no name provided)


			section type
				raw
					type={identifier}/{identifier}

				text fmt
					MIME type of the content being sent in the response body.

					Valid in response types where the body data can be file (``ok``, ``redirect``, ``error``).

					Default value: ``application/octet-stream``


			section time
				raw
					time={timestamp}

				text fmt
					The current time on the server as an RFC 3339 UTC timestamp.

					May be sent with any response type, but most useful in the ``ok`` response, where it may be used by the client in an ``if_modified`` request parameter later.

					Default value: empty (no timestamp provided)


			section modified
				raw
					modified={timestamp}

				text fmt
					RFC 3339 UTC timestamp representing the time the requested file was last modified.

					Valid in ``ok`` and ``not_modified`` responses, where it may be used by the client in an ``if_modified`` request parameter later.

					Default value: empty (no timestamp provided)


			section location
				raw
					location={identifier}/{identifier}

				text fmt
					The location to redirect to in the format of a CNP request intent.

					If the host part of the intent in the value is empty, the current host should be reused. If the host is ``.``, the current host should be reused and the path should be appended to the current path, excluding the last filename after the final slash in the current path (if any). Otherwise, the new request should be sent to the provided host, which has to be resolved to a server address (instead of just sending a request with the new host to the current server).

					For example, if the request was sent to ``example.com/foo/bar`` and the redirect location was ``/baz``, the new request's intent is ``example.com/baz``, but if the location parameter was ``./baz``, the new intent is ``example.com/foo/baz``.

					Valid in ``redirect`` response type.

					Default value: empty (not providing this parameter in a redirect response is an error, no redirect happens)


			section reason
				raw
					reason={identifier}

				text
					Describes the reason for the error.

					Defined values:

				list
					text fmt
						``syntax``: the request was not a valid CNP message

					text fmt
						``version``: the request CNP version is not supported by the server

					text fmt
						``invalid``: the received CNP message is not a valid CNP request (invalid intent format, invalid value of a defined parameter)

					text fmt
						``not_supported``: a requested feature is not supported by the server, so the request cannot be answered

					text fmt
						``too_large``: the server does not want to accept so much data (either header or body)

					text fmt
						``not_found``: the requested path was not found on the server

					text fmt
						``denied``: server does not allow access to this content (might require authentication first)

					text fmt
						``rejected``: the request did not match the server's requirements for that path, but was a valid CNP request (e.g. missing parameters the application requires or an API call provided incorrect type of data)

					text fmt
						``server_error``: internal server error

				text fmt
					Valid in ``error`` response type.

					Default value: ``server_error`` (servers should provide this parameter in the ``reason`` response if possible)


			section select
				raw
					select={identifier}:{identifier}
					select={identifier}:

				text fmt
					May be present only on responses to requests containing ``select`` parameters. If a selector was used, this parameter must be present and contain a selector that was executed on the contents. This may be the same selector that was provided in the request or one equivalent to it (e.g. a ``byte`` selector with a missing last index replaced with the index of the last byte in the contents).


	section CNP on Internet
		text fmt
			The default transport for CNP 0.4 is a plain TCP connection with the server listening on port ``25454`` by default (though an alternative port may be provided in the URL or request intent).

			The default file type for pages requested over CNP should usually be CNM documents with ``type=text/cnm``. However, any file types may be transferred over CNP.

			Clients should write exactly as much data as specified by their request ``length`` parameter; excess data may result in an ``error`` response or a blocking write, while insufficient data will likely result in the server waiting for more input instead of responding. Servers using ``length`` parameters in responses should send exactly as much data as they specified, since clients might otherwise wait for more response data or read too much data when ignoring the ``length`` and reading until the end of connection instead.

			Note that this may change in the future versions. Likely changes are the requirement of TLS for connections and a different default port.