summaryrefslogtreecommitdiff
path: root/scanner/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'scanner/README.md')
-rw-r--r--scanner/README.md43
1 files changed, 43 insertions, 0 deletions
diff --git a/scanner/README.md b/scanner/README.md
new file mode 100644
index 0000000..c131a9f
--- /dev/null
+++ b/scanner/README.md
@@ -0,0 +1,43 @@
+# Scanner
+
+A scanner takes a string in input and returns an array of tokens.
+
+Tokens can be of the following kinds:
+- identifier
+- number
+- operator
+- separator
+- string
+- block
+
+Resolving nested blocks in the scanner is making the parser simple
+and generic, without having to resort to parse tables.
+
+The lexical rules are provided by a language specification at language
+level which includes the following:
+
+- a set of composable properties (1 per bit, on an integer) for each
+ character in the ASCII range (where all separator, operators and
+ reserved keywords must be defined).
+- for each block or string, the specification of starting and ending
+ delimiter.
+
+## Development status
+
+A successful test must be provided to check the status.
+
+- [x] numbers starting with a digit
+- [ ] numbers starting otherwise
+- [x] unescaped strings (including multiline)
+- [x] escaped string (including multiline)
+- [x] separators (in UTF-8 range)
+- [x] single line string (\n not allowed)
+- [x] identifiers (in UTF-8 range)
+- [x] operators, concatenated or not
+- [x] single character block/string delimiters
+- [x] arbitrarly nested blocks and strings
+- [x] multiple characters block/string delimiters
+- [x] blocks delimited by operator characters
+- [ ] blocks delimited by identifiers
+- [x] blocks with delimiter inclusion/exclusion rules
+- [ ] blocks delimited by indentation level (python, yaml, ...)