( BaT | 2015. 02. 15., v – 00:33 )

Rfc 3986

2.2.  Reserved Characters

   URIs include components and subcomponents that are delimited by
   characters in the "reserved" set.  These characters are called
   "reserved" because they may (or may not) be defined as delimiters by
   the generic syntax, by each scheme-specific syntax, or by the
   implementation-specific syntax of a URI's dereferencing algorithm.
   If data for a URI component would conflict with a reserved
   character's purpose as a delimiter, then the conflicting data must be
   percent-encoded before the URI is formed.

      reserved    = gen-delims / sub-delims

      gen-delims  = ":" / "/" / "?" / "#" / "[" / "]" / "@"

      sub-delims  = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="

Vagyis az URI-ben olyan komponenst keresünk, ami sub-delims-t tartalmazhat és szintaktikailag állhat azon a helyen, ahol a Wikipedia URI-ben.


Appendix A.  Collected ABNF for URI

   URI           = scheme ":" hier-part [ "?" query ] [ "#" fragment ]

   hier-part     = "//" authority path-abempty
                 / path-absolute
                 / path-rootless
                 / path-empty

   authority     = [ userinfo "@" ] host [ ":" port ]
   userinfo      = *( unreserved / pct-encoded / sub-delims / ":" )

   path-abempty  = *( "/" segment )
   path-absolute = "/" [ segment-nz *( "/" segment ) ]
   path-rootless = segment-nz *( "/" segment )
   path-empty    = 0<pchar>
   segment       = *pchar
   segment-nz    = 1*pchar

A scheme, query és fragment part nem érintett, marad a hier-part. Az authority-ben ugyan lehetne sub-delim, de szintaktikailag annak az első / előtt kell állnia. A többi lehetséges komponensben nem szerepel sub-delim. Vagyis a http://hu.wikipedia.org/wiki/M%C3%A1trix_(matematika) URI valid a zárójelek encode-olása nélkül is.

RFC 1738


2.2. URL Character Encoding Issues

...

   Thus, only alphanumerics, the special characters "$-_.+!*'(),", and
   reserved characters used for their reserved purposes may be used
   unencoded within a URL.