summaryrefslogtreecommitdiff
path: root/scanner/readme.md
diff options
context:
space:
mode:
Diffstat (limited to 'scanner/readme.md')
-rw-r--r--scanner/readme.md42
1 files changed, 42 insertions, 0 deletions
diff --git a/scanner/readme.md b/scanner/readme.md
new file mode 100644
index 0000000..b8b31fb
--- /dev/null
+++ b/scanner/readme.md
@@ -0,0 +1,42 @@
+# Scanner
+
+A scanner takes a string in input and returns an array of tokens.
+
+Tokens can be of the following kinds:
+- identifier
+- number
+- operator
+- separator
+- string
+- block
+
+Resolving nested blocks in the scanner is making the parser simple
+and generic, without having to resort to parse tables.
+
+The lexical rules are provided by a language specification at language
+level which includes the following:
+
+- a set of composable properties (1 per bit, on an integer) for each
+ character in the ASCII range (where all separator, operators and
+ reserved keywords must be defined).
+- for each block or string, the specification of starting and ending
+ delimiter.
+
+## Development status
+
+A successful test must be provided to check the status.
+
+- [x] numbers starting with a digit
+- [ ] numbers starting otherwise
+- [x] unescaped strings (including multiline)
+- [x] escaped string (including multiline)
+- [x] separators (in UTF-8 range)
+- [ ] single line string (\n not allowed)
+- [x] identifiers (in UTF-8 range)
+- [x] operators, concatenated or not
+- [x] single character block/string delimiters
+- [x] arbitrarly nested blocks and strings
+- [ ] multiple characters block/string delimiters
+- [ ] blocks delimited by identifiers/operators/separators
+- [ ] blocks with delimiter inclusion/exclusion rules
+- [ ] blocks delimited by indentation level