# LLVM Tutorial #5: Operator Precedence Parsing

Published on Jun 12th 2020Duration: 25:18

I am learning LLVM. In this series I am planning to walk through the introductory tutorial called Kaleidoscope at https://llvm.org/docs/tutorial/MyFirstLanguageFrontend/. In the last episode we wrote code to parse identifiers, call expressions and numbers, this time we'll parse binary expressions and handle operator precedence.

## Transcript

The following transcript was automatically generated by an algorithm.

• 00:00 : they were life this is Toby and we're
• 00:03 : gonna continue walking through this LLVM
• 00:08 : tutorial the tutorials got my first
• 00:12 : language front-end and we're in the
• 00:14 : middle of chapter 2 chapter 2 is quite a
• 00:17 : lengthy chapter and right now we're in
• 00:22 : the middle of it in the section called
• 00:24 : the binary expression parsing so we're
• 00:27 : gonna write the part of the parser
• 00:31 : that's parses binary expressions where
• 00:34 : the binary expressions well there's
• 00:36 : something like that but it I mean this
• 00:39 : is one binary expression but it could be
• 00:42 : more complicated it could be like that
• 00:46 : now as the author talks about when you
• 00:52 : have something like this should the
• 00:55 : parser parse it as this operation first
• 00:59 : or for this operation first well in math
• 01:06 : as mathematicians or as we're taught in
• 01:09 : school multiply has higher precedence
• 01:16 : right so Mis on that rule then this
• 01:20 : should get higher presidents how do you
• 01:22 : implement operator presidents in parsing
• 01:25 : well there's multiple ways
• 01:29 : this tutorial is using a method a
• 01:33 : technique which is I believe it's called
• 01:36 : the shunting yard algorithm which I'll
• 01:40 : show you how that works
• 01:41 : this is based on assigning each operator
• 01:45 : type a number to represent its
• 01:49 : presidents so you can see this tutorial
• 01:53 : recommends using a map to store the
• 01:58 : president's number president like the
• 02:01 : president's value for each operator so
• 02:05 : the less than operator gets ten the plus
• 02:08 : operator gets 20 and so forth
• 02:12 : I'm actually thinking instead of doing
• 02:15 : this I'm just gonna write a switch
• 02:17 : statement I think that would be simpler
• 02:21 : although I have nothing against using a
• 02:23 : dictionary
• 02:24 : I just thought yeah I just read a switch
• 02:30 : statement to switch it up right so
• 02:32 : that's what I'm gonna do so in essence
• 02:35 : all we're doing is assigning a numerical
• 02:38 : value to represent the president's sort
• 02:42 : of level and the higher number the
• 02:44 : higher presidents that's how this
• 02:46 : algorithm works so instead of doing this
• 02:53 : part where it's initializing this map
• 02:57 : I'm not gonna do that I'm just gonna
• 02:58 : implement this this get toque
• 03:02 : president's function although if you
• 03:06 : wanted to follow along with the tutorial
• 03:12 : exactly then more than welcome to do
• 03:15 : that I'm actually not even gonna bother
• 03:18 : doing this I'm gonna say switch Kurt oak
• 03:26 : if it's the less than sign and I'm
• 03:31 : returning 10 probably do the same with
• 03:35 : the greater than sign I don't understand
• 03:37 : why if you're gonna do a less than sign
• 03:39 : you're not gonna do the greater than
• 03:40 : sign I'm doing the greater than sign if
• 03:44 : it's the plus sign down whoopsies
• 03:50 : ok this is the plus sign returning 20
• 03:57 : minus sign also returning 20 if it's the
• 04:02 : multiplication sign though we're
• 04:05 : returning 40 might as well do the
• 04:09 : division why not why would you do
• 04:13 : multiplication but not division and in
• 04:16 : the default case we're returning a
• 04:20 : negative 1 ok so negative 1 will for
• 04:24 : sure
• 04:25 : smaller than any obese and it will
• 04:29 : actually be smaller than zero which is
• 04:32 : actually significant as you will see in
• 04:34 : the next part all right now we're ready
• 04:37 : to write this parse expression function
• 04:40 : it looks like so let's write it
• 04:55 : I'm gonna part a primary and which will
• 05:06 : again a primary is either or a number
• 05:08 : literal an identifier or expression
• 05:12 : that's enclosed in parentheses and if we
• 05:20 : were able to parse it then if we were
• 05:24 : not able to parse it then we're
• 05:25 : returning null or a null pointer but if
• 05:30 : we were able to parse it then what we're
• 05:32 : gonna do
• 05:33 : honestly I don't like this way this
• 05:36 : certain negative case first thing I'm
• 05:39 : gonna reverse it it's just not my style
• 05:44 : we're gonna so yeah if we did manage to
• 05:48 : parse the left hand side we're gonna do
• 05:51 : we're gonna call parse the binary
• 05:54 : operation we're going to parse the
• 05:56 : little right hand side of it and it
• 06:03 : looks like we're gonna move this smart
• 06:05 : pointer in probably into that function
• 06:11 : that's probably does it talk about just
• 06:16 : move no it kind of doesn't talk about
• 06:18 : this C++ stuff it assumes that you
• 06:21 : already know it
• 06:25 : okay now we're gonna write this parse
• 06:28 : buying up right-hand side
• 06:43 : I'm gonna explain what this is
• 06:56 : okay so given a a president this press
• 07:03 : stands for precedence
• 07:07 : everything is so abbreviated so this is
• 07:11 : a president's number here a president's
• 07:19 : number excuse me
• 07:25 : it's a number that's zero or greater
• 07:29 : zero
• 07:32 : every operator will have a number a
• 07:35 : president's number that's greater than
• 07:37 : zero so by initializing to zero what you
• 07:40 : guarantee is that the any operate any
• 07:45 : actual operator will be recognized as
• 07:49 : having greater presidents than the
• 07:52 : initial presidents level and here we're
• 07:58 : gonna say there's a while loop
• 08:04 : we're gonna keep parsing the right hand
• 08:09 : side I think that's what it's happening
• 08:12 : maybe so this is saying if this is a
• 08:16 : binary operation find its president
• 08:33 : but yet it's presidents so we just parse
• 08:37 : the left-hand side and so now we should
• 08:39 : be staring at a token that is a operator
• 08:44 : like a plus sign or something get its
• 08:47 : presidents and if it's presidents it's
• 08:52 : less than the current presidents which
• 08:55 : is just expert presidents which you'll
• 09:03 : see why that number actually comparing
• 09:07 : with this number is actually meaningful
• 09:11 : I think later but if it's less than that
• 09:16 : we know that we were kind of ignoring it
• 09:20 : ignoring the next operator and we're
• 09:22 : just gonna return the left hand side
• 09:25 : again we'll explain that later this code
• 09:29 : gets the presidents of the current token
• 09:31 : and checks to see if is too low so if
• 09:34 : it's too low
• 09:36 : we're not gonna grab the right-hand side
• 09:38 : as in if we have and then plus two and
• 09:49 : whatever this mysterious thing
• 09:52 : it's presidents let's say was really
• 09:55 : high like 90 it has a precedence 90
• 10:04 : therefore the the presidents of a plus
• 10:07 : sign is too low I think the presidents
• 10:10 : of a plus sign is only 20 so the
• 10:15 : president of a plus sign is way too low
• 10:18 : so it was like no ignore it doesn't
• 10:22 : bother taking in this the rest of this
• 10:26 : stuff and just returns the left-hand
• 10:30 : side as it's only thing that's what this
• 10:33 : if statement says we'll see why that
• 10:36 : fits into the big picture later
• 10:38 : this check implicit knows that the para
• 10:41 : stream ends when the token stream runs
• 10:44 : out of binary operators if this check
• 10:47 : succeeds
• 10:48 : we know that the token is a binary
• 10:50 : operator as in I think that by token I
• 10:56 : think it means this left-hand side is
• 10:58 : already a binary operator because unless
• 11:01 : it was already a binary operator that
• 11:04 : had a higher precedence than the plus
• 11:06 : sign that this condition cannot possibly
• 11:09 : occur so that time means this mystery
• 11:12 : mystery thing is probably something like
• 11:15 : 2 times 3 because how else it could it
• 11:20 : have a higher precedence right okay so
• 11:24 : now okay we know this is a bit up get
• 11:31 : next token wait where does this code
• 11:33 : gonna I guess it's saying
• 11:39 : so if else if it didn't return from it
• 11:45 : now we know this is a bit up does it
• 11:56 : actually know we're gonna get next token
• 12:02 : to eat the pin up and then we're gonna
• 12:15 : parse the primary has its right-hand
• 12:18 : side
• 12:23 : and I guess if there is no right-hand
• 12:28 : side then returned a null pointer again
• 12:35 : I kind of don't like this negative
• 12:36 : testing negative case first kind of
• 12:39 : style as such this code eats and
• 12:44 : remembers the binary operator and then
• 12:46 : parses the primary expression that
• 12:48 : follows this builds up the whole pair
• 12:52 : the first of which is plus B for the
• 12:55 : running example now that we parse the
• 12:59 : left-hand side of an expression and one
• 13:02 : pair of our right hands I sequence we
• 13:05 : have to decide which way the expression
• 13:07 : associates in particular we could have a
• 13:10 : plus B Benna on parsed or a plus and
• 13:16 : then in parentheses B been up on parsed
• 13:24 : up to determines presidents and compare
• 13:27 : it to the pinups presidents which is
• 13:30 : plus in this case
• 13:41 : so okay so if I'm again I'm gonna do
• 13:47 : this positive case first okay so
• 13:52 : basically it's saying were we were able
• 13:55 : to manage to parse the right hand side D
• 13:59 : if if the next one is also a binary
• 14:05 : operator I think that's what that means
• 14:08 : so is like if we have 1 plus 2 times 3
• 14:12 : we already got that as the left hand
• 14:15 : side I got that as the right hand side
• 14:20 : but we're going to look ahead to see if
• 14:23 : this next one is also a operator which
• 14:28 : which has even higher precedence than
• 14:31 : this one which in this case it does in
• 14:36 : this particular example it does so the
• 14:40 : next president the precedence of the
• 14:43 : next token yes
• 14:51 : the tok press which is the previous to
• 14:56 : previous operators presidents if it's
• 14:59 : less than the next tokens next operators
• 15:03 : presidents then we know the parentheses
• 15:15 : associate as a + b been up
• 15:32 : loop around at the top of the wild look
• 15:34 : so let's see I think this is the case
• 15:37 : where he says if body omitted oh I guess
• 15:45 : it would be the other way I'm actually
• 15:48 : kind of confused I'm gonna copy this
• 15:51 : line I'm not sure this is in an example
• 15:58 : so in our example into first a plus B
• 16:02 : and execute the next iteration of the
• 16:05 : loop with plus as the current token okay
• 16:08 : let me see if this I me
• 16:17 : critical let me finish reading this a
• 16:19 : critical question left here is how can
• 16:21 : the if condition parse the right hand
• 16:24 : side in full in particular to built the
• 16:28 : ASD correctly for our example it needs
• 16:30 : to get all of C plus D times e times F
• 16:35 : as the right hands eye expression
• 16:37 : variable code to do this is surprisingly
• 16:40 : simple code from the above two blocks
• 16:45 : duplicated for context
• 16:57 : so this I have this line and then if the
• 17:03 : tok press is less than next press then i
• 17:08 : oh then I wasn't actually wanna part the
• 17:13 : right hand side I did not have that line
• 17:31 : if not part has a standard return the
• 17:39 : whole pointer and you're saying this guy
• 17:44 : should come out of this if statement is
• 17:53 : this right this looks kind of
• 17:56 : complicated
• 18:12 : one thing I don't like at this point
• 18:19 : i'm not instructed to test each of these
• 18:24 : functions as i write them so and i feel
• 18:28 : like this is a place where I wonder
• 18:31 : verify that the code I have is correct
• 18:36 : now I could screw to the bottom and then
• 18:40 : just get access to the code but I want
• 18:43 : to figure it out myself as well let me
• 18:50 : think
• 18:51 : so if I had an example like one plus two
• 18:56 : plus three that looks ugly so what would
• 19:04 : have happened is we first start at
• 19:08 : parsed expression and then parse primary
• 19:13 : would have consumed this one and then
• 19:21 : after one was consumed as a primary so
• 19:25 : one one went into the left-hand side
• 19:27 : basically and caught we're gonna move
• 19:34 : this con token cursor to this plus sign
• 19:42 : and called parse been up right hand side
• 19:47 : parts been up right hand side I think is
• 19:50 : gonna it's always going to be called
• 19:52 : when the next token is a is a well not
• 19:59 : necessarily but supposedly it's caught
• 20:03 : when the next token is an operator so
• 20:11 : let's suppose it is it's gonna say
• 20:13 : what's the president's of this operator
• 20:15 : is it's gonna turned out to be 20
• 20:23 : for the plus sign and then because we
• 20:27 : started this at zero well 20 is gonna
• 20:31 : not gonna be less than zero so we're
• 20:34 : gonna keep going here and the current
• 20:39 : token is gonna be the operator and we're
• 20:43 : going to consume that and then we're
• 20:45 : gonna try to parse the right-hand side
• 20:47 : which will that will get parsed as the
• 20:50 : primary left-hand side which was
• 20:57 : previously parsed but passed into this
• 21:00 : function and we also have the right-hand
• 21:02 : side and now we're gonna say well if we
• 21:05 : got a right-hand side we actually need
• 21:07 : to keep looking ahead to see whether we
• 21:11 : should go ahead and group these guys
• 21:13 : together or not or whether maybe we need
• 21:16 : to group these guys together because if
• 21:18 : this was a multiplication sign we
• 21:20 : actually need to group these guys
• 21:22 : together first before we group ourselves
• 21:24 : together like that but because in this
• 21:26 : example is a plus sign has the same
• 21:29 : president that's the previous one we
• 21:32 : should just group these guys together
• 21:33 : see if this code actually does that so
• 21:38 : we're gonna see the presidents of the
• 21:40 : next one which is also gonna be 20 is 20
• 21:43 : less than 20 no therefore is returned to
• 21:49 : here what is this assign the left-hand
• 21:56 : side oh I see yeah did this this line
• 22:04 : groups these guys together it makes a
• 22:07 : new binary expression ast with the crown
• 22:13 : left-hand side in the current right-hand
• 22:15 : side which was here and here and groups
• 22:18 : this together and says this is the new
• 22:20 : left-hand side so this is correct
• 22:23 : so if if the president's is not less or
• 22:30 : if the Knik the presidents of the NICS
• 22:33 : operator
• 22:34 : is not strictly greater than the
• 22:37 : presidents of the previous operator then
• 22:40 : we're going to group the current left
• 22:41 : hand side right hand side together and
• 22:44 : call that the neo left hand side and
• 22:46 : then go back to the top of the loop
• 22:49 : however all right let me clear this out
• 22:53 : because this is getting too messy
• 22:55 : however if we have a multiplication sign
• 23:00 : here then the next president would be 40
• 23:06 : and the toq presidents would be 20 and
• 23:12 : therefore it would actually take this
• 23:14 : path and if that happened we're actually
• 23:17 : gonna try to sort of parse parse this
• 23:21 : part with right hand side set the
• 23:25 : president's to the current 1+1
• 23:30 : interesting so like 21 looks 21 we're
• 23:37 : gonna move the current right hand side
• 23:40 : remove the 2 into its so 2 becomes to
• 23:44 : the left hand side yeah I'm gonna try to
• 23:48 : press it and have that as obesity we
• 23:51 : want this guy to be the new right hand
• 23:53 : side and that makes sense so this makes
• 23:58 : sense to me this checks out so let's go
• 24:06 : for that and if if we did first and a
• 24:11 : new right hand side then that's the one
• 24:17 : we use to combine it with the old left
• 24:21 : hand side but if we didn't get any new
• 24:23 : right-hand side return a null pointer to
• 24:26 : represent erroring out okay so that's
• 24:29 : how it works
• 24:30 : all right so how much more is there to
• 24:33 : this code I think we can probably finish
• 24:36 : this so let's go ahead and parse the
• 24:39 : rest I hope this is not too much longer
• 24:42 : I hope I gave a good explanation of
• 24:45 : operator presidents and I hope I can get
• 24:48 : it to work let me just keep an eye on
• 24:55 : the clock how long have I been 25
• 25:00 : minutes actually that's quite long yeah
• 25:05 : actually I'm gonna stop this one and
• 25:07 : leave the parts the rest for the next
• 25:12 : video I will see you on the other side