Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
154 commits
Select commit Hold shift + click to select a range
4385ade
simplify the applescript lexer
Mar 26, 2026
cf0b0f8
raise if both a callback and args are passed
Apr 4, 2026
3c7eb04
disable Lint/NonLocalExitFromIterator
Apr 4, 2026
ec70b31
add minitest-focus
Apr 4, 2026
7ff3892
new keywords API
Apr 4, 2026
4aab35b
move keywords api into the core rules section
Apr 3, 2026
a24378b
use the keywords api for python
Apr 11, 2026
bce8738
use the keywords api for actionscript
Apr 3, 2026
1027247
use the keywords api for ada
Apr 3, 2026
7a1f2d8
use the keywords api for apex
Apr 3, 2026
f617d51
use the keywords api for applescript
Apr 4, 2026
6c842b8
use the keywords api for armasm
Apr 4, 2026
16f4d77
use the keywords api for augeas
Apr 4, 2026
314f8a9
use the keywords api for awk
Apr 4, 2026
5433aa6
use the keywords api for batchfile
Apr 8, 2026
c01c076
allow joined regexes for BBCBasic
Apr 8, 2026
1fabb1a
use the keywords api for Bicep
Apr 8, 2026
4e90335
eliminate regex join in bpf
Apr 8, 2026
cc10aa0
use the keywords api for brightscript
Apr 8, 2026
ceeb695
use the keywords api for BSL
Apr 8, 2026
870f8d1
use the keywords api for c
Apr 8, 2026
a7e8508
use the keywords api in cfscript
Apr 8, 2026
98e8db5
use the keywords api for Clean
Apr 8, 2026
fead18d
use the keywords api for cmake
Apr 8, 2026
2a6b5f6
eliminate preproc keyword list from cmhg
Apr 8, 2026
fce5467
use the keywords api for cobol
Apr 8, 2026
e32c6fb
use the keywords api for coffeescript
Apr 8, 2026
8b31e32
use the keywords api for common lisp
Apr 8, 2026
dfa0ebb
remove extraneous set creation in cop
Apr 9, 2026
f1d3aed
add a spec that keywords don't get leaked up the chain
Apr 9, 2026
6a01350
use the keywords api for crystal
Apr 9, 2026
fa8026b
use the keywords api for csharp (and refactor preproc)
Apr 9, 2026
16f26da
use the keywords api for CSS
Apr 9, 2026
ac1cb43
remove common_lisp from rubocop_todo
Apr 9, 2026
2623df4
use the keywords API for cypher, and use Punctuation not Str::Symbol
Apr 9, 2026
c16bcae
fix sasscommon
Apr 11, 2026
a60c807
use the keywords api for cython
Apr 11, 2026
c743cec
use the keywords API for dlang
Apr 11, 2026
b0e15e8
use keywords api for dafny, and add some keywords from the spec
Apr 11, 2026
be85ef2
use the keywords api for dart
Apr 12, 2026
f7e35be
use the keywords api for datastudio
Apr 12, 2026
fc603d1
use the keywords api for digdag
Apr 12, 2026
afa625d
use the keywords api for dylan
Apr 13, 2026
c4464a6
use the keywords api for eiffel
Apr 13, 2026
be4e236
use the keywords api for elm
Apr 13, 2026
551acf8
use the keywords api for Erlang
Apr 13, 2026
2c4ec8a
use the keywords api for factor
Apr 13, 2026
5d2b558
use the keywords api for fortran
Apr 13, 2026
8d86c81
use the keywords api for freefem
Apr 14, 2026
7b18a47
use the keywords api for fsharp
Apr 14, 2026
f6b95af
use the keywords api for gdscript
Apr 14, 2026
f66ecae
use the keywords api for glsl
Apr 15, 2026
f12a6b9
remove duplicate set element from gdscript
Apr 15, 2026
1ecf01e
use keywords api for groovy/gradle
Apr 15, 2026
7f65fb8
use the keywords api for haskell
Apr 15, 2026
f5cbc89
use the keywords api for haxe
Apr 15, 2026
7d04864
rework the hlsl lexer
Apr 16, 2026
3e6409d
add another example to bbcbasic
Apr 16, 2026
7e86299
rebaseme: eager load superclass first
Apr 16, 2026
43bf9c2
use keywords api for sql
Apr 16, 2026
7cb9e88
break out hql keywords
Apr 16, 2026
00d82aa
use a static regex for http and highlight content-type
Apr 16, 2026
3528c9c
use the keywords api for hylang
Apr 16, 2026
a125674
use the keywords api for IDLang
Apr 17, 2026
0c81688
use the keywords api for idris
Apr 24, 2026
c1fe5a6
use the keywords api for iecst
Apr 24, 2026
92daecf
use the keyword api for igor pro
Apr 24, 2026
e1b181b
use the keywords api for io
Apr 24, 2026
08f5fd8
use the keywords api for isabelle
Apr 24, 2026
9886bef
use the keywords api for isbl
Apr 24, 2026
25ec853
use the keywords api for Janet
Apr 24, 2026
1554b60
raise if the state is closed and keywords is used
May 11, 2026
0a2ef9a
simplify the java lexer
May 11, 2026
9d3f747
use the keywords api for javscript
May 11, 2026
f60851b
use the keywords api for jinja
May 11, 2026
d0f1c34
use the keywords api for jsonnet
May 11, 2026
8a845e9
RegexLexer: thread the lexer class through rules for better error mes…
May 14, 2026
eede653
make jinja subclasses use sets
May 14, 2026
2a175f5
use the keywords api for julia
May 14, 2026
b21232a
use the keywords api for Kick Assembler
May 14, 2026
0413fed
rubocop nonsense
May 15, 2026
c8ef3f2
javscript: remove duplicate set element
May 15, 2026
9291ad3
stop using interpolated regexes in jsp
May 15, 2026
64bfe57
fix kick assembler dot keywords
May 15, 2026
877cbf7
remove armasm from rubocop todo
May 15, 2026
b50f8ca
use the keywords api for kotlin
May 15, 2026
9eb4009
change support for annotation use-site targets to be more generic
May 15, 2026
cd243ec
fix rubocop for dafny
May 15, 2026
28c5ab7
use the keywords api for lean
May 15, 2026
9e099de
jsp: remove unused vars
May 15, 2026
cf118a1
kotlin: remove unused vars
May 15, 2026
14d93bf
use the keywords api for lasso
May 15, 2026
1f0276b
remove idlang and janet from rubocop todo
May 15, 2026
979e5b5
use the keywords api for verilog
May 15, 2026
fd42094
use the keywords api for addmusick
May 29, 2026
7536037
use the keywords api for javascript, but more
May 29, 2026
be6ed8b
use the keywords api for livescript
May 29, 2026
8490f53
rubocop fix
May 29, 2026
b933553
use the keywords api for lustre
May 29, 2026
765b136
use the keywords api and remove overhighlighting on m68k
May 29, 2026
6d25242
use the keywords api for magik, simplify the lexer, and fix multichar…
May 29, 2026
bee5f7c
fix warnings on magik and delete rubocop exception
May 29, 2026
8c7f00c
use the keywords api for make (and refactor a bit)
Jun 9, 2026
3eb061c
use the keywords api for shell
Jun 9, 2026
455f2b4
fix brittle specs
Jun 9, 2026
61866c3
clean up the mason lexer somewhat and handle the quoting problem slig…
Jun 10, 2026
66a1811
use the keywords api for Mathematica
Jun 10, 2026
aab9ed4
use the keywords api for matlab
Jun 10, 2026
5a0d7a3
use the keywords api for meson
Jun 10, 2026
7e49244
use the keywords api for minizinc
Jun 10, 2026
9097502
use the keywords api for mosel, and adhere properly to the spec
Jun 10, 2026
aecac3e
use the keywords api for nesasm
Jun 10, 2026
715796d
use the keywords api for nial
Jun 10, 2026
a027fe1
use the keywords api for nim
Jun 10, 2026
b55d1b0
use the keywords api for nix
Jun 10, 2026
19e9b84
use the keywords api for objective-c and objective-cpp
Jun 10, 2026
0420530
use the keywords api for ocaml and rescript
Jun 10, 2026
d75b290
use the keywords api for OCL
Jun 10, 2026
325ab24
use the keywords api for openedge
Jun 10, 2026
14a5599
use the keywords api for opentype
Jun 10, 2026
469d712
use the keywords api for p4
Jun 10, 2026
cfb1c4e
use the keywords api for pascal
Jun 10, 2026
0ee0d41
use the keywords api for plsql
Jun 20, 2026
ce98847
use the keywords api for pony
Jun 20, 2026
3867a9a
use the keywords api for postscript
Jun 20, 2026
f4dec25
use the keywords api for powershell
Jun 21, 2026
dd918a2
use the keywords api for praat
Jun 21, 2026
d66b755
use the keywords api for prometheus
Jun 22, 2026
bf2466f
use the keywords api for puppet
Jun 22, 2026
5c83b68
use the keywords api for Q
Jun 22, 2026
709e16a
use the keywords api for R
Jun 22, 2026
c7e1ba3
use the keywords api for racket
Jun 22, 2026
3503dd1
use the keywords api for Reason and OCaml
Jun 22, 2026
de4cd44
use the keywords api for rego
Jun 22, 2026
498cb72
use the new OCamlCommon naming for ReScript
Jun 22, 2026
58e6b52
remove racket from the todo file
Jun 22, 2026
408dff7
use the keywords api for RML, and remove overhighlighting
Jun 22, 2026
1c964aa
use the keywords api for ruby
Jun 22, 2026
56a13f9
use the keywords api for rust
Jun 23, 2026
bf5e992
use the keywords api for SAS
Jun 23, 2026
75e1cd0
use the keywords api more for SassCommon
Jun 23, 2026
045884e
use the keywords api for scala
Jun 23, 2026
7090495
use the keywords api for scheme
Jun 23, 2026
b8ca3e0
use the keywords api for sieve
Jun 24, 2026
06b0977
use the keywords api for smarty
Jun 24, 2026
975f1d0
use the keywords api for SML
Jun 24, 2026
b48488b
use the keywords api for SQF
Jun 24, 2026
ad9f5c0
use the keywords api (mostly) for stan
Jun 24, 2026
aa992f5
use the keywords api for stata
Jun 24, 2026
52db194
use the keywords api for supercollider
Jun 24, 2026
7abf34b
use the keywords api for swift
Jun 24, 2026
9c8b112
use the keywords api for syzlang
Jun 24, 2026
3d6a13c
use the keywords api for syzprog
Jun 24, 2026
8198ef3
use the keywords api for tcl
Jun 24, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,9 @@ Lint/NestedPercentLiteral:
Lint/DuplicateBranch:
Enabled: false

Lint/NonLocalExitFromIterator:
Enabled: false

Naming/BlockForwarding:
Enabled: false

Expand Down
80 changes: 0 additions & 80 deletions .rubocop_todo.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,104 +6,24 @@
# Note that changes in the inspected code, or installation of new
# versions of RuboCop, may require this file to be generated again.

# Offense count: 1
# Configuration parameters: AllowComments, AllowEmptyLambdas.
Lint/EmptyBlock:
Exclude:
- 'lib/rouge/lexers/nix.rb'

# Offense count: 4
Naming/ConstantName:
Exclude:
- 'lib/rouge/lexers/eiffel.rb'

# Offense count: 5
# This cop supports unsafe autocorrection (--autocorrect-all).
# Configuration parameters: EnforcedStyleForLeadingUnderscores.
# SupportedStylesForLeadingUnderscores: disallowed, required, optional
Naming/MemoizedInstanceVariableName:
Exclude:
- 'lib/rouge/lexers/freefem.rb'
- 'lib/rouge/lexers/verilog.rb'

# Offense count: 16
# Configuration parameters: EnforcedStyle, AllowedPatterns, ForbiddenIdentifiers, ForbiddenPatterns.
# SupportedStyles: snake_case, camelCase
# ForbiddenIdentifiers: __id__, __send__
Naming/MethodName:
Exclude:
- 'lib/rouge/lexers/igorpro.rb'
- 'lib/rouge/lexers/xpath.rb'

# Offense count: 40
# Configuration parameters: EnforcedStyle, AllowedIdentifiers, AllowedPatterns, ForbiddenIdentifiers, ForbiddenPatterns.
# SupportedStyles: snake_case, camelCase
Naming/VariableName:
Exclude:
- 'lib/rouge/lexers/brightscript.rb'
- 'lib/rouge/lexers/dafny.rb'
- 'lib/rouge/lexers/freefem.rb'
- 'lib/rouge/lexers/igorpro.rb'
- 'lib/rouge/lexers/kotlin.rb'
- 'lib/rouge/lexers/xpath.rb'

# Offense count: 113
Rouge/NoBuildingAlternationPatternInRegexp:
Exclude:
- 'lib/rouge/lexers/apple_script.rb'
- 'lib/rouge/lexers/armasm.rb'
- 'lib/rouge/lexers/bbcbasic.rb'
- 'lib/rouge/lexers/bicep.rb'
- 'lib/rouge/lexers/cfscript.rb'
- 'lib/rouge/lexers/cmhg.rb'
- 'lib/rouge/lexers/crystal.rb'
- 'lib/rouge/lexers/csharp.rb'
- 'lib/rouge/lexers/d.rb'
- 'lib/rouge/lexers/dart.rb'
- 'lib/rouge/lexers/eiffel.rb'
- 'lib/rouge/lexers/elm.rb'
- 'lib/rouge/lexers/erlang.rb'
- 'lib/rouge/lexers/haskell.rb'
- 'lib/rouge/lexers/http.rb'
- 'lib/rouge/lexers/idris.rb'
- 'lib/rouge/lexers/isabelle.rb'
- 'lib/rouge/lexers/java.rb'
- 'lib/rouge/lexers/jsp.rb'
- 'lib/rouge/lexers/kotlin.rb'
- 'lib/rouge/lexers/lean.rb'
- 'lib/rouge/lexers/magik.rb'
- 'lib/rouge/lexers/make.rb'
- 'lib/rouge/lexers/mason.rb'
- 'lib/rouge/lexers/mosel.rb'
- 'lib/rouge/lexers/nix.rb'
- 'lib/rouge/lexers/p4.rb'
- 'lib/rouge/lexers/pascal.rb'
- 'lib/rouge/lexers/postscript.rb'
- 'lib/rouge/lexers/r.rb'
- 'lib/rouge/lexers/rust.rb'
- 'lib/rouge/lexers/scala.rb'
- 'lib/rouge/lexers/sml.rb'
- 'lib/rouge/lexers/stata.rb'
- 'lib/rouge/lexers/vala.rb'
- 'lib/rouge/lexers/wollok.rb'
- 'lib/rouge/lexers/xojo.rb'

# Offense count: 20
Rouge/NoHugeCollections:
Exclude:
- 'lib/rouge/lexers/apple_script.rb'
- 'lib/rouge/lexers/cobol.rb'
- 'lib/rouge/lexers/common_lisp.rb'
- 'lib/rouge/lexers/css.rb'
- 'lib/rouge/lexers/datastudio.rb'
- 'lib/rouge/lexers/freefem.rb'
- 'lib/rouge/lexers/hql.rb'
- 'lib/rouge/lexers/idlang.rb'
- 'lib/rouge/lexers/igorpro.rb'
- 'lib/rouge/lexers/janet.rb'
- 'lib/rouge/lexers/openedge.rb'
- 'lib/rouge/lexers/plsql.rb'
- 'lib/rouge/lexers/racket.rb'
- 'lib/rouge/lexers/sas.rb'
- 'lib/rouge/lexers/sql.rb'
- 'lib/rouge/lexers/stata.rb'
1 change: 1 addition & 0 deletions Gemfile
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ gem 'rake'

gem 'minitest', '>= 5.0'
gem 'minitest-power_assert'
gem 'minitest-focus'
gem 'power_assert', '~> 2.0'

# don't try to install redcarpet under jruby
Expand Down
138 changes: 87 additions & 51 deletions docs/LexerDevelopment.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,6 +267,93 @@ rules, as if the regular expression had not matched.

You can see an example of these more complex rules in [the Ruby lexer][ruby-lexer].

#### Keywords, Builtins, and other word sets

Most programming languages reserve certain words for use as identifiers that
have a special meaning in the language. To make regular expressions that search
for these words easier, many lexers will put the applicable keywords in a
set and make them available in a particular way (be it as a local variable,
an instance variable or what have you).

For small sets, you can simply use a constant:

```rb
module Rouge
module Lexers
class YetAnotherLanguage < RegexLexer
# ...
KEYWORDS = Set.new %w(key words used in this language)
# ...
end
end
end
```

If the keyword sets are very large (>75 elements or so), please put them in a constant in a separate file, which is lazily loaded:

```rb
# lib/rouge/lexers/my_lang.rb
module Rouge
module Lexers
class MyLang < RegexLexer
tag :my_lang
# ...

lazy do
requre_relative 'my_lang/keywords'
end

# ...
end
end
end

# lib/rouge/lexers/my_lang/keywords.rb
module Rouge
module Lexers
class MyLang
KEYWORDS = Set.new %w(massive set goes here)
end
end
end
```

This way, users of Rouge who are not using your language will not have to load the keyword sets. These keywords can then be used with the special `#keywords` api:

```rb
state :my_cool_state do
# ...

# Use a "covering regex" here, like /\w+/, which matches the general form
# of keywords. This should be possible for almost all languages. If there is no
# appropriate regex that will cover all cases, consider special-casing some of
# the exceptions. If the language has *truly* free-form keyword sets (like Gherkin),
# please let us know and we'll consider allowing a large regex instead.
keywords %r/\w+/ do
# this method acts much like a normal #rule call - it takes a token type, and
# potentially an action block. Except instead of a regex, it takes a Set, which
# the match from above will be checked against, in the order they appear here.
rule KEYWORDS, Keyword
rule BUILTINS, Name::Builtin, :pop!

# symbols will be mapped to class methods on the lexer
rule :small_set, Name::Function

# or you can just inline small sets here
rule Set[%w(one two three four)], Num::Integer
# etc...

# optional: transform the match before checking set membership
transform(&:downcase)

# optional: a default rule if the match isn't contained in any of the sets.
# If this is not given, Rouge will simply fall through to the next rules after
# the keywords block.
default Name
end
end
```

### Additional Features

While the properties and states are the minimum elements of a lexer that need
Expand Down Expand Up @@ -303,57 +390,6 @@ Filename Globs][conflict-globs] below.

[conflict-globs]: #conflicting-filename-globs

#### Special Words

Most programming languages reserve certain words for use as identifiers that
have a special meaning in the language. To make regular expressions that search
for these words easier, many lexers will put the applicable keywords in a
set and make them available in a particular way (be it as a local variable,
an instance variable or what have you).

For performance and safety, we strongly recommend lexers use a class method:

```rb
module Rouge
module Lexers
class YetAnotherLanguage < RegexLexer
...

def self.keywords
@keywords ||= Set.new %w(key words used in this language)
end

...
end
end
end
```

These keywords can then be used like so:

```rb
rule /\w+/ do |m|
if self.class.keywords.include?(m[0])
token Keyword
elsif
token Name
end
end
```

In some cases, you may want to interpolate your keywords into a regular
expression. **We strongly recommend you avoid doing this.** Having a large
number of rules that are searching for particular words is not as performant as
a rule with a generic pattern with a block that checks whether the pattern is a
member of a predefined set and assigns tokens, pushes new states, etc.

If you do need to use interpolation, be careful to use the `\b` anchor to avoid
inadvertently matching part of a longer word (eg. `if` matching `iff`)::

```rb
rule /\b(#{keywords.join('|')})\b/, Keyword
```

#### Startup

```rb
Expand Down
4 changes: 2 additions & 2 deletions lib/rouge/lexer.rb
Original file line number Diff line number Diff line change
Expand Up @@ -237,10 +237,10 @@ def eager_load!
return if @_loaded
@_loaded = true

lazy_procs.each { |b| instance_eval(&b) }

superclass.eager_load! unless superclass == Lexer

lazy_procs.each { |b| instance_eval(&b) }

self
end

Expand Down
24 changes: 7 additions & 17 deletions lib/rouge/lexers/actionscript.rb
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,7 @@ def self.constants
end

def self.builtins
@builtins ||= %w(
@builtins ||= Set.new %w(
void Function Math Class
Object RegExp decodeURI
decodeURIComponent encodeURI encodeURIComponent
Expand Down Expand Up @@ -129,22 +129,12 @@ def self.builtins

rule %r/[{}]/, Punctuation, :statement

rule id do |m|
if self.class.keywords.include? m[0]
token Keyword
push :expr_start
elsif self.class.declarations.include? m[0]
token Keyword::Declaration
push :expr_start
elsif self.class.reserved.include? m[0]
token Keyword::Reserved
elsif self.class.constants.include? m[0]
token Keyword::Constant
elsif self.class.builtins.include? m[0]
token Name::Builtin
else
token Name::Other
end
keywords id do
rule :keywords, Keyword
rule :declarations, Keyword::Declaration
rule :reserved, Keyword::Reserved
rule :builtins, Name::Builtin
default Name::Other
end

rule %r/\-?[0-9][0-9]*\.[0-9]+([eE][0-9]+)?[fd]?/, Num::Float
Expand Down
Loading