-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathbigdata.txt
More file actions
2394 lines (2394 loc) · 94.8 KB
/
bigdata.txt
File metadata and controls
2394 lines (2394 loc) · 94.8 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
The Fast Lexical Analyser Generator
Copyright c 1998–2015 by Gerwin Klein, Steve Rowe, and Régis Décamps.
JFlex User’s Manual
Version 1.6.1, March 16, 2015
Contents
1 Introduction
1.1 Design goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 About this manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
3
3
2 Installing and Running JFlex
2.1 Installing JFlex . . . . . . . .
2.1.1 Windows . . . . . . .
2.1.2 Unix with tar archive
2.2 Running JFlex . . . . . . . . .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. 4
4
4
5
5
work with JFlex
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . .
. . . . . . . . . . .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. .
.
.
. 6
9
9
10
12
4 Lexical Specifications
4.1 User code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2 Options and declarations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4.2.1 Class options and user class code . . . . . . . . . . . . . . . . . . . . . 12
12
12
13
3 A simple Example: How to
3.1 Code to include . . . .
3.2 Options and Macros .
3.3 Rules and Actions . .
3.4 How to get it going . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
.
.
.
..
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. 15
16
17
17
18
18
19
20
20
20
21
21
24
27
28
29
5 Encodings, Platforms, and Unicode
5.1 The Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Scanning text files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Scanning binaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
31
32
33
6 Conformance with Unicode Regular Expressions UTS#18 33
7 A few words on performance 35
4.3
4.2.2 Scanning method . . . . . . . . . . . . .
4.2.3 The end of file . . . . . . . . . . . . . .
4.2.4 Standalone scanners . . . . . . . . . . .
4.2.5 CUP compatibility . . . . . . . . . . . .
4.2.6 BYacc/J compatibility . . . . . . . . . .
4.2.7 Character sets . . . . . . . . . . . . . .
4.2.8 Line, character and column counting . .
4.2.9 Obsolete JLex options . . . . . . . . . .
4.2.10 State declarations . . . . . . . . . . . .
4.2.11 Macro definitions . . . . . . . . . . . . .
Lexical rules . . . . . . . . . . . . . . . . . . . .
4.3.1 Syntax . . . . . . . . . . . . . . . . . . .
4.3.2 Semantics . . . . . . . . . . . . . . . . .
4.3.3 How the input is matched . . . . . . . .
4.3.4 The generated class . . . . . . . . . . .
4.3.5 Scanner methods and fields accessible in
8 Porting Issues
8.1 Porting from JLex . . . . . . . . . . .
8.2 Porting from lex/flex . . . . . . . . . .
8.2.1 Basic structure . . . . . . . . .
8.2.2 Macros and Regular Expression
8.2.3 Lexical Rules . . . . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
actions (API)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. .
.
.
.
. 37
37
38
38
38
39
. . . .
. . . .
. . . .
. . . .
0.10j .
. . . .
. . . . .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. .
.
.
.
.
.
. 39
39
39
39
39
40
41
43
10 Bugs and Deficiencies
10.1 Deficiencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.2 Bugs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
46
46
11 Copying and License 47
. . . . .
. . . . .
. . . . .
Syntax
. . . . .
9 Working together
9.1 JFlex and CUP . . . . . . . . . . . . . . . . . .
9.1.1 CUP2 . . . . . . . . . . . . . . . . . . .
9.1.2 CUP version 0.10j and above . . . . . .
9.1.3 Custom symbol interface . . . . . . . . .
9.1.4 Using existing JFlex/CUP specifications
9.2 JFlex and BYacc/J . . . . . . . . . . . . . . . .
9.3 JFlex and Jay . . . . . . . . . . . . . . . . . . .
2
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . .
. . .
. . .
. . .
with
. . .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . .
. . .
. . .
. . .
CUP
. . .
. . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.1 Introduction
JFlex is a lexical analyser generator for Java 1 written in Java. It is also a rewrite of the very
useful tool JLex [3] which was developed by Elliot Berk at Princeton University. As Vern
Paxson states for his C/C++ tool flex [13]: they do not share any code though.
1.1 Design goals
The main design goals of JFlex are:
• Full unicode support
• Fast generated scanners
• Fast scanner generation
• Convenient specification syntax
• Platform independence
• JLex compatibility
1.2 About this manual
This manual gives a brief but complete description of the tool JFlex. It assumes that you
are familiar with the issue of lexical analysis. The references [1], [2], and [15] provide a good
introduction to this topic.
The next section of this manual describes installation procedures for JFlex. If you never
worked with JLex or just want to compare a JLex and a JFlex scanner specification you
should also read Working with JFlex - an example (section 3). All options and the complete
specification syntax are presented in Lexical specifications (section 4); Encodings, Platforms,
and Unicode (section 5) provides information about scanning text vs. binary files. If you are
interested in performance considerations and comparing JLex with JFlex speed, a few words
on performance (section 7) might be just right for you. Those who want to use their old JLex
specifications may want to check out section 8.1 Porting from JLex to avoid possible problems
with not portable or non standard JLex behaviour that has been fixed in JFlex. Section 8.2
talks about porting scanners from the Unix tools lex and flex. Interfacing JFlex scanners
with the LALR parser generators CUP, CUP2 and BYacc/J is explained in working together
(section 9). Section 10 Bugs gives a list of currently known active bugs. The manual concludes
with notes about Copying and License (section 11) and references.
1
Java is a trademark of Sun Microsystems, Inc., and refers to Sun’s Java programming language. JFlex is not
sponsored by or affiliated with Sun Microsystems, Inc.
32 Installing and Running JFlex
2.1 Installing JFlex
2.1.1 Windows
To install JFlex on Windows 95/98/NT/XP, follow these three steps:
1. Unzip the file you downloaded into the directory you want JFlex in (using something
like WinZip 2 ). If you unzipped it to say C:\, the following directory structure should be
generated:
C:\jflex-1.6.1\
+--bin\
+--doc\
+--examples\
+--byaccj\
+--cup\
+--interpreter\
+--java\
+--simple-maven\
+--standalone-maven\
+--zero-reader\
+--lib\
+--src\
+--main\
+--cup\
+--java\
+--java cup\
+--runtime\
+--jflex\
+--gui\
+--jflex\
+--resources\
+--test\
(start scripts)
(FAQ and manual)
(calculator example for BYacc/J)
(calculator example for cup)
(interpreter example for cup)
(Java lexer specification)
(example scanner built with maven)
(a simple standalone scanner,
built with maven)
(example for Readers that return 0 characters)
(precompiled classes, skeleton files)
(JFlex parser spec)
(source code of cup runtime classes)
(source code of JFlex)
(source code of JFlex UI classes)
(JFlex scanner spec)
(messages and default skeleton file)
(unit tests)
2. Edit the file bin\jflex.bat (in the example it’s C:\jflex-1.6.1\bin\jflex.bat) such
that
• JAVA HOME contains the directory where your Java JDK is installed (for instance
C:\java) and
• JFLEX HOME the directory that contains JFlex (in the example: C:\jflex-1.6.1)
3. Include the bin\ directory of JFlex in your path. (the one that contains the start script,
in the example: C:\jflex-1.6.1\bin).
2
http://www.winzip.com
42.1.2 Unix with tar archive
To install JFlex on a Unix system, follow these two steps:
• Decompress the archive into a directory of your choice with GNU tar, for instance to
/usr/share:
tar -C /usr/share -xvzf jflex-1.6.1.tar.gz
(The example is for site wide installation. You need to be root for that. User installation
works exactly the same way—just choose a directory where you have write permission)
• Make a symbolic link from somewhere in your binary path to bin/jflex, for instance:
ln -s /usr/share/jflex-1.6.1/bin/jflex /usr/bin/jflex
If the Java interpreter is not in your binary path, you need to supply its location in the
script bin/jflex.
You can verify the integrity of the downloaded file with the MD5 checksum available on the
JFlex download page. If you put the checksum file in the same directory as the archive, you
run:
md5sum --check jflex-1.6.1.tar.gz.md5
It should tell you
jflex-1.6.1.tar.gz:
OK
2.2 Running JFlex
You run JFlex with:
jflex <options> <inputfiles>
It is also possible to skip the start script in bin/ and include the file lib/jflex-1.6.1.jar
in your CLASSPATH environment variable instead.
Then you run JFlex with:
java jflex.Main <options> <inputfiles>
or with:
java -jar jflex-1.6.1.jar <options> <inputfiles>
The input files and options are in both cases optional. If you don’t provide a file name on the
command line, JFlex will pop up a window to ask you for one.
JFlex knows about the following options:
-d <directory>
writes the generated file to the directory <directory>
--skel <file>
uses external skeleton <file>. This is mainly for JFlex maintenance and special low
level customisations. Use only when you know what you are doing! JFlex comes with a
skeleton file in the src directory that reflects exactly the internal, pre-compiled skeleton
and can be used with the -skel option.
5--nomin
skip the DFA minimisation step during scanner generation.
--jlex
tries even harder to comply to JLex interpretation of specs.
--dot
generate graphviz dot files for the NFA, DFA and minimised DFA. This feature is still
in alpha status, and not fully implemented yet.
--dump
display transition tables of NFA, initial DFA, and minimised DFA
--legacydot
dot (.) meta character matches [^\n] instead of
[^\n\r\u000B\u000C\u0085\u2028\u2029]
--noinputstreamctor
don’t include an InputStream constructor in the generated scanner
--verbose or -v
display generation progress messages (enabled by default)
--quiet or -q
display error messages only (no chatter about what JFlex is currently doing)
--warn-unused
warn about unused macros (by default true in verbose mode and false in quiet mode)
--no-warn-unused
do not warn about unused macros (by default true in verbose mode and false in quiet
mode)
--time
display time statistics about the code generation process (not very accurate)
--version
print version number
--info
print system and JDK information (useful if you’d like to report a problem)
--unicodever <ver>
print all supported properties for Unicode version <ver>
--help or -h
print a help message explaining options and usage of JFlex.
3 A simple Example: How to work with JFlex
To demonstrate what a lexical specification with JFlex looks like, this section presents a part
of the specification for the Java language. The example does not describe the whole lexical
structure of Java programs, but only a small and simplified part of it (some keywords, some
operators, comments and only two kinds of literals). It also shows how to interface with the
LALR parser generator CUP [8] and therefore uses a class sym (generated by CUP), where
6integer constants for the terminal tokens of the CUP grammar are declared. JFlex comes
with a directory examples, where you can find a small standalone scanner that doesn’t need
other tools like CUP to give you a running example. The ”examples” directory also contains
a complete JFlex specification of the lexical structure of Java programs together with the
CUP parser specification for Java by C. Scott Ananian, obtained from the CUP [8] web site
(it was modified to interface with the JFlex scanner). Both specifications adhere to the Java
Language Specification [7].
/* JFlex example: part of Java language lexer specification */
import java_cup.runtime.*;
/**
* This class is a simple example lexer.
*/
%%
%class Lexer
%unicode
%cup
%line
%column
%{
StringBuffer string = new StringBuffer();
private Symbol symbol(int
return new Symbol(type,
}
private Symbol symbol(int
return new Symbol(type,
}
type) {
yyline, yycolumn);
type, Object value) {
yyline, yycolumn, value);
%}
LineTerminator = \r|\n|\r\n
InputCharacter = [^\r\n]
WhiteSpace
= {LineTerminator} | [ \t\f]
/* comments */
Comment = {TraditionalComment} | {EndOfLineComment} | {DocumentationComment}
TraditionalComment
= "/*" [^*] ~"*/" | "/*" "*"+ "/"
// Comment can be the last line of the file, without line terminator.
EndOfLineComment
= "//" {InputCharacter}* {LineTerminator}?
DocumentationComment = "/**" {CommentContent} "*"+ "/"
CommentContent
= ( [^*] | \*+ [^/*] )*
Identifier = [:jletter:] [:jletterdigit:]*
DecIntegerLiteral = 0 | [1-9][0-9]*
7%state STRING
%%
/* keywords
<YYINITIAL>
<YYINITIAL>
<YYINITIAL>
*/
"abstract"
"boolean"
"break"
<YYINITIAL> {
/* identifiers */
{Identifier}
{ return symbol(sym.ABSTRACT); }
{ return symbol(sym.BOOLEAN); }
{ return symbol(sym.BREAK); }
{ return symbol(sym.IDENTIFIER); }
/* literals */
{DecIntegerLiteral}
\" { return symbol(sym.INTEGER_LITERAL); }
{ string.setLength(0); yybegin(STRING); }
/* operators */
"="
"=="
"+" { return symbol(sym.EQ); }
{ return symbol(sym.EQEQ); }
{ return symbol(sym.PLUS); }
/* comments */
{Comment} { /* ignore */ }
/* whitespace */
{WhiteSpace} { /* ignore */ }
}
<STRING> {
\"
[^\n\r\"\\]+
\\t
\\n { yybegin(YYINITIAL);
return symbol(sym.STRING_LITERAL,
string.toString()); }
{ string.append( yytext() ); }
{ string.append(’\t’); }
{ string.append(’\n’); }
\\r
\\\"
\\ { string.append(’\r’); }
{ string.append(’\"’); }
{ string.append(’\\’); }
}
/* error fallback */
[^]
{ throw new Error("Illegal character <"+
yytext()+">"); }
From this specification JFlex generates a .java file with one class that contains code for the
scanner. The class will have a constructor taking a java.io.Reader from which the input
8is read. The class will also have a function yylex() that runs the scanner and that can be
used to get the next token from the input (in this example the function actually has the name
next token() because the specification uses the %cup switch).
As with JLex, the specification consists of three parts, divided by %%:
• usercode,
• options and declarations and
• lexical rules.
3.1 Code to include
Let’s take a look at the first section, “user code”: The text up to the first line starting with %%
is copied verbatim to the top of the generated lexer class (before the actual class declaration).
Beside package and import statements there is usually not much to do here. If the code ends
with a javadoc class comment, the generated class will get this comment, if not, JFlex will
generate one automatically.
3.2 Options and Macros
The second section “options and declarations” is more interesting. It consists of a set of
options, code that is included inside the generated scanner class, lexical states and macro
declarations. Each JFlex option must begin a line of the specification and starts with a %. In
our example the following options are used:
• %class Lexer tells JFlex to give the generated class the name “Lexer” and to write the
code to a file “Lexer.java”.
• %unicode defines the set of characters the scanner will work on. For scanning text files,
%unicode should always be used. The Unicode version may be specified, e.g. %unicode
4.1. If no version is specified, the most recent supported Unicode version will be used -
in JFlex 1.6.1, this is Unicode 7.0. See also section 5 for more information on character
sets, encodings, and scanning text vs. binary files.
• %cup switches to CUP compatibility mode to interface with a CUP generated parser.
• %line switches line counting on (the current line number can be accessed via the variable
yyline)
• %column switches column counting on (current column is accessed via yycolumn)
The code included in %{...%} is copied verbatim into the generated lexer class source. Here
you can declare member variables and functions that are used inside scanner actions. In our
example we declare a StringBuffer “string” in which we will store parts of string literals and
two helper functions “symbol” that create java cup.runtime.Symbol objects with position
information of the current token (see section 9.1 JFlex and CUP for how to interface with the
parser generator CUP). As JFlex options, both %{ and %} must begin a line.
The specification continues with macro declarations. Macros are abbreviations for regular
expressions, used to make lexical specifications easier to read and understand. A macro
declaration consists of a macro identifier followed by =, then followed by the regular expression
9it represents. This regular expression may itself contain macro usages. Although this allows a
grammar like specification style, macros are still just abbreviations and not non terminals –
they cannot be recursive or mutually recursive. Cycles in macro definitions are detected and
reported at generation time by JFlex.
Here some of the example macros in more detail:
• LineTerminator stands for the regular expression that matches an ASCII CR, an ASCII
LF or an CR followed by LF.
• InputCharacter stands for all characters that are not a CR or LF.
• TraditionalComment is the expression that matches the string "/*" followed by a
character that is not a *, followed by anything that does not contain, but ends in "*/".
As this would not match comments like /****/, we add "/*" followed by an arbitrary
number (at least one) of "*" followed by the closing "/". This is not the only, but one
of the simpler expressions matching non-nesting Java comments. It is tempting to just
write something like the expression "/*" .* "*/", but this would match more than we
want. It would for instance match the whole of /* */ x = 0; /* */, instead of two
comments and four real tokens. See DocumentationComment and CommentContent for
an alternative.
• CommentContent matches zero or more occurrences of any character except a * or any
number of * followed by a character that is not a /
• Identifier matches each string that starts with a character of class jletter followed
by zero or more characters of class jletterdigit. jletter and jletterdigit are
predefined character classes. jletter includes all characters for which the Java function
Character.isJavaIdentifierStart returns true and jletterdigit all characters for
that Character.isJavaIdentifierPart returns true.
The last part of the second section in our lexical specification is a lexical state declaration:
%state STRING declares a lexical state STRING that can be used in the “lexical rules” part
of the specification. A state declaration is a line starting with %state followed by a space
or comma separated list of state identifiers. There can be more than one line starting with
%state.
3.3 Rules and Actions
The ”lexical rules” section of a JFlex specification contains regular expressions and actions
(Java code) that are executed when the scanner matches the associated regular expression. As
the scanner reads its input, it keeps track of all regular expressions and activates the action
of the expression that has the longest match. Our specification above for instance would
with input ”breaker” match the regular expression for Identifier and not the keyword
”break” followed by the Identifier ”er”, because rule {Identifier} matches more of this
input at once (i.e. it matches all of it) than any other rule in the specification. If two regular
expressions both have the longest match for a certain input, the scanner chooses the action of
the expression that appears first in the specification. In that way, we get for input ”break”
the keyword ”break” and not an Identifier ”break”.
Additional to regular expression matches, one can use lexical states to refine a specification.
10A lexical state acts like a start condition. If the scanner is in lexical state STRING, only
expressions that are preceded by the start condition <STRING> can be matched. A start
condition of a regular expression can contain more than one lexical state. It is then matched
when the lexer is in any of these lexical states. The lexical state YYINITIAL is predefined
and is also the state in which the lexer begins scanning. If a regular expression has no start
conditions it is matched in all lexical states.
Since you often have a bunch of expressions with the same start conditions, JFlex allows the
same abbreviation as the Unix tool flex:
<STRING> {
expr1
{ action1 }
expr2
{ action2 }
}
means that both expr1 and expr2 have start condition <STRING>.
The first three rules in our example demonstrate the syntax of a regular expression preceded
by the start condition <YYINITIAL>.
<YYINITIAL> "abstract"
{ return symbol(sym.ABSTRACT); }
matches the input ”abstract” only if the scanner is in its start state ”YYINITIAL”. When the
string ”abstract” is matched, the scanner function returns the CUP symbol sym.ABSTRACT. If
an action does not return a value, the scanning process is resumed immediately after executing
the action.
The rules enclosed in
<YYINITIAL> {
...
}
demonstrate the abbreviated syntax and are also only matched in state YYINITIAL.
Of these rules, one may be of special interest:
\"
{ string.setLength(0); yybegin(STRING); }
If the scanner matches a double quote in state YYINITIAL we have recognised the start of a
string literal. Therefore we clear our StringBuffer that will hold the content of this string
literal and tell the scanner with yybegin(STRING) to switch into the lexical state STRING.
Because we do not yet return a value to the parser, our scanner proceeds immediately.
In lexical state STRING another rule demonstrates how to refer to the input that has been
matched:
[^\n\r\"]+
{ string.append( yytext() ); }
The expression [^\n\r\"]+ matches all characters in the input up to the next backslash
(indicating an escape sequence such as \n), double quote (indicating the end of the string), or
line terminator (which must not occur in a string literal). The matched region of the input is
referred to with yytext() and appended to the content of the string literal parsed so far.
The last lexical rule in the example specification is used as an error fallback. It matches any
character in any state that has not been matched by another rule. It doesn’t conflict with any
11other rule because it has the least priority (because it’s the last rule) and because it matches
only one character (so it can’t have longest match precedence over any other rule).
3.4 How to get it going
• Install JFlex (see section 2 Installing JFlex )
• If you have written your specification file (or chosen one from the examples directory),
save it (say under the name java-lang.flex).
• Run JFlex with
jflex java-lang.flex
• JFlex should then report some progress messages about generating the scanner and
write the generated code to the directory of your specification file.
• Compile the generated .java file and your own classes. (If you use CUP, generate your
parser classes first)
• That’s it.
4 Lexical Specifications
As shown above, a lexical specification file for JFlex consists of three parts divided by a single
line starting with %%:
UserCode
%%
Options and declarations
%%
Lexical rules
In all parts of the specification comments of the form /* comment text */ and the Java style
end of line comments starting with // are permitted. JFlex comments do nest - so the number
of /* and */ should be balanced.
4.1 User code
The first part contains user code that is copied verbatim into the beginning of the source file
of the generated lexer before the scanner class is declared. As shown in the example above,
this is the place to put package declarations and import statements. It is possible, but not
considered good Java programming style to put own helper classes (such as token classes) in
this section. They should get their own .java file instead.
4.2 Options and declarations
The second part of the lexical specification contains options to customise your generated lexer
(JFlex directives and Java code to include in different parts of the lexer), declarations of
12lexical states and macro definitions for use in the third section “Lexical rules” of the lexical
specification file.
Each JFlex directive must be situated at the beginning of a line and starts with the % character.
Directives that have one or more parameters are described as follows:
%class "classname"
means that you start a line with %class followed by a space followed by the name of the
class for the generated scanner (the double quotes are not to be entered, see the example
specification in section 3).
4.2.1 Class options and user class code