# LLVM Tutorial #4: Recursive Descent Parser

Published on Jun 12th 2020Duration: 28:34

I am learning LLVM. In this series I am planning to walk through the introductory tutorial called Kaleidoscope at https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/. In the last episode we defined the AST Node classes that we would need. This time we'll make the recursive descent parser.

## Transcript

The following transcript was automatically generated by an algorithm.

• 00:03 : all right we're live hello this is Toby
• 00:06 : and we're gonna continue our LLVM
• 00:10 : tutorial walkthrough
• 00:13 : this is part 3 I think or something we
• 00:18 : were in the middle of chapter 2 of the
• 00:21 : tutorial we done the sort of first chunk
• 00:27 : of chapter 2 which was defining these
• 00:30 : classes to represent the ast node in
• 00:35 : this language the next part is gonna be
• 00:37 : parser basics okay and here we're gonna
• 00:42 : be making a recursive descent parser
• 00:45 : recursive descent parser consists of a
• 00:48 : series of potentially recursive
• 00:52 : functions usually they have some
• 00:55 : recursive element to it which is why the
• 00:58 : word recursive is in name and usually
• 01:01 : there will be one function for parsing
• 01:06 : each one of the syntax elements in in
• 01:09 : the program language and which I'll
• 01:13 : explain as we go but I'm gonna just
• 01:17 : follow along this tutorial and type in
• 01:20 : the code it instructs me to type at
• 01:23 : least at this point that's what I'm
• 01:25 : gonna do
• 01:28 : there's gonna be a cur toke variable
• 01:32 : which stands for current token which
• 01:38 : remembers the current token in the
• 01:41 : stream of tokens so remember that
• 01:44 : there's like let's say you have a string
• 01:50 : which I don't know it says something
• 01:53 : like maybe the string says space equals
• 02:06 : space 5 maybe that's the string
• 02:11 : I was the token
• 02:12 : you're gonna do well the tokenizer says
• 02:14 : yep tap is a token that's a that's an
• 02:19 : identifier token and and it contains
• 02:23 : this food string within it so that's one
• 02:27 : token I'm gonna skip this space that's
• 02:31 : nothing I ignored it
• 02:33 : this guy is a is a token as well but in
• 02:37 : this lecture it just represents it as
• 02:40 : the character itself it's not like a
• 02:44 : known token per se it's one of these
• 02:48 : supposedly oh it's not even yeah there's
• 02:54 : a line of common that says like unknown
• 02:58 : tokens are represented by their code so
• 03:07 : like if you type in equal sign and
• 03:09 : instead of one of these negative numbers
• 03:12 : you would you actually get a positive
• 03:14 : number you get actually get a positive
• 03:17 : number which is their ASCII code right
• 03:19 : so I get back to my diagram so so basic
• 03:26 : and then we so we have two tokens so far
• 03:29 : one token is this ID token the next
• 03:32 : token is this token that just is the
• 03:35 : equal sign and then this five actually
• 03:38 : will get recognized as a number token
• 03:44 : which is gonna have number five in it so
• 03:48 : we actually have three tokens so is
• 03:51 : actually a stream of three tokens if you
• 03:53 : can think of this as a stream with with
• 04:01 : stuff coming in from the stream so so
• 04:07 : when you read a token you're gonna call
• 04:11 : this get Tok function it'll give you the
• 04:14 : next token in the stream so you get Tok
• 04:17 : the first Tok Tok the first token you're
• 04:22 : gonna get is this ID token
• 04:25 : but we're gonna buffer it we're gonna
• 04:27 : save it in this temporary variable
• 04:29 : called car tokens which is what this get
• 04:33 : next token does it will call get tok and
• 04:35 : save this current token in car tok to
• 04:42 : make it available for the actual parser
• 04:44 : that we'll be examining it's a value as
• 04:47 : you will see okay now we'll have a cup
• 05:03 : of functions for our error reporting
• 05:08 : i'll type them out and then hopefully
• 05:12 : can explain what they are doing
• 05:18 : adequately although no guarantees
• 05:23 : f printf is a is a print function like
• 05:28 : printf except it it directs it to a
• 05:31 : specific file in this case standard
• 05:33 : error so that in JavaScript there would
• 05:36 : be like console dot error
• 05:46 : I don't know why we're returning a null
• 05:51 : pointer for this lock error function but
• 05:54 : yeah
• 05:55 : visually it prints that out with this
• 05:57 : string passed in as the mess the
• 05:59 : additional message to be printed out and
• 06:04 : we have a separate function for every
• 06:07 : reporting this looks like well we have a
• 06:18 : lock error
• 06:19 : specifically for prototypes I'm not sure
• 06:22 : about that actually
• 06:24 : now this wait for a later and when the
• 06:28 : tutorial actually uses this function to
• 06:33 : to try to explain it at that point
• 06:36 : because at this point I honestly don't
• 06:38 : know why we have a secondary reporting
• 06:42 : function okay in any case I'm gonna
• 06:47 : winder pointer should beep spelled PTR
• 06:52 : obviously duh okay
• 06:55 : how these two functions copy now we can
• 07:01 : for a recursive descent parser the
• 07:04 : pattern is gonna be whatever your symbol
• 07:07 : name is whatever your symbol name is
• 07:16 : there you slap the word parse in front
• 07:21 : of it and that's gonna be a function
• 07:23 : that you write you're gonna so in this
• 07:26 : case it's parsed number expression and
• 07:30 : the next one is gonna be parsed paren
• 07:32 : expression right let's write this parse
• 07:36 : number x per
• 07:52 : Otto is a keyword in C++ similar to var
• 07:58 : in c-sharp where it allows to develop
• 08:03 : her to not have to explicitly write out
• 08:05 : the type for that variable c plus s
• 08:09 : we'll just infer the type of the
• 08:11 : variable even though it's not made
• 08:14 : explicit make unique is a standard
• 08:19 : library function that creates a unique
• 08:23 : pointer and allocates memory to to store
• 08:29 : this particular object of this class so
• 08:35 : that basically this is actually gonna
• 08:37 : have the effect of mooing up this number
• 08:41 : X / ast except in so this is like in
• 08:48 : Java you would have written in C++ you
• 08:52 : can write this - actually you can write
• 08:55 : new number X / ast but instead of doing
• 09:00 : this new thing we're using make unique
• 09:02 : which has the effect of making use of
• 09:06 : this smart pointer approach
• 09:14 : which I will try to explain to the best
• 09:17 : of my ability although to be honest it
• 09:21 : it's kind of new to me and this standard
• 09:25 : that move will basically move the
• 09:30 : ownership of this smart pointer out of
• 09:34 : this function so that whoever is calling
• 09:36 : this parse number expert receives this
• 09:41 : smart pointer and now they owned it they
• 09:43 : are responsible for cleaning it up okay
• 09:48 : so this one is just gonna it sort of
• 09:52 : this parse number expression is very
• 09:55 : simple because it kind of is expecting a
• 09:58 : number to already have been parsed make
• 10:02 : a number of token too early have been
• 10:05 : parsed it doesn't even check whether the
• 10:07 : current thing is a number token or not
• 10:09 : because is expecting whoever calling it
• 10:12 : has already looked at the type of the
• 10:16 : current token and saw that it was a
• 10:19 : number so yeah let's do the next one
• 10:23 : parse paren expression
• 10:36 : okay this one is more complicated it's
• 10:39 : gonna actually just expects the friend
• 10:44 : open friend to be the first token here
• 10:47 : and just eats it and by the way this
• 10:51 : term eat is something that ever hurt
• 10:53 : often eat the the reason it's cooked
• 11:03 : eat okay so let's go back to this tokens
• 11:08 : we have a stream of tokens right token
• 11:11 : here token here took in here maybe this
• 11:14 : token is the open parenthesis and then
• 11:17 : maybe you have N and then closed
• 11:19 : parentheses let's just eat that means
• 11:22 : well the current token is pointing to
• 11:25 : this one right now each means I want to
• 11:28 : consume this token by moving this cursor
• 11:31 : this way
• 11:32 : so I like to move this cursor this way
• 11:36 : that's what they mean by eat so that we
• 11:42 : change the value of Curt Oak the current
• 11:46 : token to be the next token because
• 11:49 : that's what get next token does I kind
• 11:56 : of have a problem with this this idea of
• 12:05 : eating tokens and consuming tokens and
• 12:08 : tokens have to be consumed whether I'm
• 12:12 : not going to get into that too deeply
• 12:14 : now but I've been looking into
• 12:20 : functional parsing and I feel like maybe
• 12:24 : that I probably prefer that approach
• 12:29 : over this
• 12:43 : again we're gonna eat I'm gonna eat a
• 12:46 : token we have already checked that it's
• 12:52 : actually I would like to refer to flip
• 12:55 : this conditional and have the positive
• 12:57 : conditional first so if we can already
• 13:03 : see that the next token or the current
• 13:06 : token I should say is the closing
• 13:08 : parenthesis then we're gonna eat we're
• 13:11 : gonna consume it oops and also we're
• 13:21 : gonna return this V whatever the W has
• 13:27 : returned but we having a defined part
• 13:29 : expression yeah and why do we return the
• 13:33 : result of Lockett don't like that I
• 13:39 : think it would be better to return all
• 13:41 : pointer that's my opinion but you know
• 13:47 : I'm gonna I'm gonna follow the tutorial
• 13:50 : just so that I don't get confused later
• 13:52 : on because I deviated too much from the
• 13:56 : tutorial
• 13:57 : okay so if we basically this is saying
• 14:02 : like there's a parenthesis with some
• 14:07 : stuff in it and the stuff in it I don't
• 14:09 : know it could be anything it could be an
• 14:11 : expression like that but basically is
• 14:13 : first gonna something outside of this
• 14:17 : function has already seen that it has
• 14:19 : already seen this token and and it's
• 14:24 : decided we should parse a paren
• 14:26 : expression now we're going to parse an
• 14:28 : expression which should consume this
• 14:31 : part and then take that part and put it
• 14:35 : in V if we didn't get that didn't work
• 14:38 : though we're returning it out but if
• 14:40 : that did work we're gonna make sure the
• 14:43 : next token is the closing print and then
• 14:46 : we're gonna eat it we're gonna advance
• 14:48 : this cursor this cursor is getting
• 14:53 : yes after this and then we're gonna eat
• 14:56 : it yeah eat it advances it and then
• 14:59 : we're gonna return whatever that
• 15:00 : expression was that was returned by the
• 15:03 : parsed expression okay but if that were
• 15:06 : didn't work either we're gonna log on
• 15:08 : there that's what's going on moving on
• 15:13 : next part this time we're gonna parse an
• 15:16 : identifier expression I think it is
• 15:20 : actually a misnomer so I looked at it a
• 15:23 : or a function call expression so I'm
• 15:26 : gonna name it more justly you'll see let
• 15:33 : me finish typing this bit I'm gonna call
• 15:41 : this parse identifier because that's
• 15:45 : actually what it's doing okay
• 15:54 : again it's actually assuming that the
• 15:58 : whoever calls this function has already
• 16:01 : tested the fact that we just saw the
• 16:05 : current token is a identifier token so
• 16:09 : we're just using the identifier string
• 16:11 : which is supposedly only valid it should
• 16:16 : only be valid when we are currently
• 16:19 : looking at the identifier token we only
• 16:25 : only this should only be valid when Kurt
• 16:30 : oak is of type toque identifier that's
• 16:36 : the technical I'm gonna eat that
• 16:42 : identifier token and then now if now
• 16:47 : we're expecting oh not expecting but
• 16:51 : again whoever this the guy who would
• 16:56 : likes to write this the guy who wrote
• 16:58 : this tutorial likes to do the negative
• 17:00 : conditions first I kind of prefer
• 17:02 : positive conditions first so I'm gonna
• 17:05 : do this so if this is basically the
• 17:09 : expression that's like it could it could
• 17:13 : just be like variable which could just
• 17:16 : be n in which case oh yeah that's an
• 17:19 : identifier but it could be an and then
• 17:24 : it's a function call and there's
• 17:26 : actually a function and that's what this
• 17:29 : next part is parsing here it's saying oh
• 17:31 : well I just person identifier but what
• 17:33 : if the next character right after that
• 17:35 : is an open parenthesis then we're gonna
• 17:37 : parse this argument list and it's
• 17:42 : actually a call expression and that's
• 17:45 : what this this part of the is statement
• 17:47 : is gonna do but in the else Clause is
• 17:50 : actually just gonna recognize this guy
• 17:53 : as an identifier so I'll do I'll do the
• 17:58 : else Clause first that's the easier
• 18:00 : taste
• 18:01 : the case I think it handled here yeah so
• 18:10 : we're just gonna make a unique again use
• 18:12 : this make unique function to create a
• 18:14 : smart pointer to this this and this time
• 18:18 : is a variable expression AST with this
• 18:21 : ID name yeah and then we're done if we
• 18:27 : didn't get a parenthesis as the next
• 18:30 : thing but if we do get the apprentices
• 18:33 : then what we're gonna do is have a while
• 18:43 : loop he did a wild one
• 18:53 : yeah oh we're gonna make a vector to
• 18:56 : store the argument first did we eat this
• 19:01 : yeah we're gonna need to eat
• 19:04 : so this eats the open parenthesis this
• 19:10 : this one eats the identifier as he had
• 19:15 : comments expressing and this is now
• 19:19 : we're gonna make a vector which again
• 19:22 : that's like in a dynamic array it's a
• 19:32 : it's a vector of arguments it's like an
• 19:35 : array or gonna push into we're gonna
• 19:38 : basically call parse expression
• 19:51 : he kind of inlined it inside this if
• 19:53 : statement I'm not gonna do that that's
• 19:55 : just my style
• 19:58 : so if tourist expression got us
• 20:02 : something back then we're gonna that's a
• 20:05 : good argument we're gonna stick it
• 20:08 : inside this vector in in JavaScript you
• 20:13 : would do array that push and in C++ you
• 20:17 : do vector that push back it means
• 20:21 : push it to the back of the vector so
• 20:25 : pushing the art to the back of the
• 20:28 : vector but if we didn't get an
• 20:31 : expression we're just gonna return a
• 20:34 : null pointer and now if if now after
• 20:42 : consuming this expression if that worked
• 20:45 : and the current token is closing
• 20:51 : parentheses then we're finished with
• 20:54 : reading in this this argument list we're
• 20:58 : gonna break out this while loop and then
• 21:01 : we're gonna eat this token I'm actually
• 21:06 : gonna eat this token in here I'm not
• 21:09 : deviate from how he does it eat this
• 21:14 : closing front now if if however took is
• 21:21 : a comma which which means there's more
• 21:26 : arguments that that means like oh
• 21:28 : actually you're calling this function
• 21:31 : with one we just saw comma so there's
• 21:35 : some more arguments we go back to the
• 21:37 : top of the while loop and try to parse
• 21:39 : another expression right first this
• 21:42 : number two so if that's the case we
• 21:46 : actually want to consume this eat eat
• 21:52 : this comma and we want to continue
• 21:55 : actually
• 21:58 : continue ohdon't didn't have to continue
• 22:00 : we could just let it go but we want to
• 22:02 : start back at the top of this loop again
• 22:06 : but if it that was not true however so
• 22:12 : if it was neither the closing print or
• 22:16 : the comma then we actually went to log
• 22:21 : this error yeah I expect that the
• 22:25 : closing print or I guess this could be a
• 22:29 : if I feel like it's better and I'm kind
• 22:35 : of deviating I'm kind of riffing off of
• 22:38 : this code example because I feel more
• 22:41 : confident about writing a parser like
• 22:45 : this because I had some practice
• 22:46 : recently but if you want to follow
• 22:50 : closer to this code then by all means
• 22:53 : stick closer to it and I think finally
• 23:00 : once we break out of this he ate the
• 23:04 : token outside of this loop but I decided
• 23:08 : to eat it right here
• 23:11 : I prefer it this way I think and yeah
• 23:17 : after that we're gonna return this
• 23:22 : unique this
• 23:24 : yeah unique pointer again by using make
• 23:27 : unique of a call expert ast and this is
• 23:34 : a call expression so it's gonna have a
• 23:35 : function name which is the ID name Plus
• 23:38 : this arcs and we again need to use
• 23:41 : standard move to move this arcs variable
• 23:48 : to function so we grant them ownership
• 23:51 : of this vector basically so this smart
• 23:56 : pointer to a vector okay so that's
• 24:00 : parsing an identifier or a call
• 24:02 : expression let's keep going as far as a
• 24:05 : primary what the heck is a primary
• 24:08 : so a primary is I believe is a Hungary
• 24:17 : expression it's either there's a reason
• 24:22 : why they call it primary although I
• 24:25 : think it's kind of confusing it's a it's
• 24:28 : either number a primary is either a
• 24:32 : number or it's a identifier like you
• 24:35 : know a variable and/or foo or it's a
• 24:41 : expression that's within a parentheses
• 24:46 : all of these you can think of as a under
• 24:50 : e expression there's only one it can be
• 24:54 : paired with other things though by a
• 24:56 : binary expression so that's what a
• 25:01 : primary is so let's write this code to
• 25:03 : parse a primary
• 25:33 : so it's gonna do a switch on the current
• 25:37 : token it does a default first which is
• 25:41 : quite strange I haven't seen that before
• 25:44 : I'm gonna do this switch at the end it
• 25:49 : says unknown token when expecting what
• 25:54 : are you expecting an expression okay I'm
• 25:59 : gonna put this default clause at the end
• 26:01 : and then so if it's an identifier it's
• 26:06 : gonna sort of call the pars identifier I
• 26:12 : actually called it the pars identifier
• 26:14 : or Co expression if it's a tok number
• 26:21 : then it's gonna go to parse number
• 26:24 : expert how comprend is this gonna call
• 26:31 : the parse paren expert so this actually
• 26:35 : makes a lot of sense because it's
• 26:36 : basically saying a race horse busily
• 26:43 : saying oh well we got this tokens coming
• 26:46 : in the first token is like an ID token
• 26:53 : it may be the ideas
• 26:56 : well if then I know the next token is an
• 27:01 : identifier then what am I gonna try to
• 27:05 : which parts from today I'm gonna try to
• 27:07 : call if I call parse number that won't
• 27:10 : work because number is expecting a
• 27:12 : number token right so each of these
• 27:16 : three parts functions are expecting
• 27:20 : already that a specific token type is
• 27:24 : currently the what what the value of
• 27:29 : Curt Oak has so if the curt o current
• 27:34 : token is identified who can that's when
• 27:37 : we invoke this guy
• 27:39 : if this function which is already
• 27:41 : expecting this to be the case and that's
• 27:44 : what this parts primary is doing for us
• 27:47 : it's basic based on the type of the
• 27:49 : current token is dispatching the parsing
• 27:52 : to one of these three functions all
• 27:57 : right let's keep going I'm gonna binary
• 28:01 : expression parsing let me think about
• 28:05 : this should we dive into this in this
• 28:09 : video or should we wait until the next
• 28:13 : video
• 28:13 : let me see what's the length of this
• 28:15 : video I think I'm gonna defer this one
• 28:23 : to the next video so um yeah if you
• 28:28 : following along then I'll see you on the
• 28:30 : other side