Posts about appengine

justtodolist updates and appengine stuff

I made some enhancements again to justtodolist. There's a couple of interesting points I came across so I thought I would jot it down.

The stateful interactive shell sample app.
This is a must for any app engine developer! It's leaps and bounds closer to the python shell than the default interactive shell. Get it from here. It needs some tweaking to get it working along side your app. Hint, aside from modifying your app.yaml, you will likely have to modify the app config in shell.py as well.

Again, simplicity works!
I made a simplification in my model in the in workings of the ordering of the lists, and the result was much less/simpler and less buggy code.

Data model migration.
So as I pointed out before, migration in app engine is not really well supported, essentially, you are on your own. If you want to change the type of a property - for example, I wanted to change a ListProperty(int) to a ListProperty(db.Key) - you will likely completely break your app, because the data will no validate. There is no way to migrate the data, because there's no way to access the existing data. The only workaround that I could come up with was to rename the property(well, actually, adding a new one and leaving the old one, at least temporarily). After that I could write a migration script. Now that I had the interactive shell on production as well(the default shell was only available for development), I could run it easily through the shell instead of writing a handler with the sole purpose of running the migration script.

My co-worker pointed out to me that an automatic migration scheme might be the future for the appengine data store and similar object oriented datastores. The idea was very interesting. Basically, we could come up with a scheme for migrating data models using renaming mechanisms like what I did above. But, we could do it all behind the scenes inside a framework. I can see how it would work and how it could be done:
  1. you would have a global schema version number. Everytime you modify the schema the version number would increment, ideally, the framework would somehow detect the change you made and up the version automatically.
  2. you would have a sort of version control for every property of every object, and have a numbered naming scheme for them, so: name_1, name_2, for example. Every change to the property would up the property version number.
  3. Each object would have a version property. This means that the system can operate with objects that have different schema versions at the same time without breaking.
  4. Removing a property is easy, nothing more really needs to be done. Unless you want to provide a way derive a value to the old missing property, in which case you can provide a property descriptor of the existing object to do it.
  5. Adding a property is easy too, you can provide a function of the existing object to provide an initial value if you want, which will be invoked the first time that the property is used.
  6. Renaming a property can be done by specifying the property with an old_name directive like: new_name = db.StringProperty(old_name="old_name") then the code just has to create the new property and initialize it to the value of the old property the first time it is used.
  7. Change the type of a property can be done by using an old_type directive much like the old_name directive for renaming, and then you may provide a conversion function of the existing object that returns the new initial value. Behind the scenes, the property version number will be incremented, so you are really adding a new property, which is aliased somehow back to the same name later.
Sounds good...There's one problem though, as you edit the schema in your .py file with directives like old_name and initial value functions which only apply to the most recent migration. But, since as we've seen, migration happens on the fly(as the data is accessed), there needs to be a historical archive of all migration directives, i.e. real version control. One way to do it would be to write explicit migration definitions as in rails migration. So something like:
add_property(Person, 'age', db.IntegerProperty())
instead of editing the models directly. I think this way is sufficient. I was kinda hoping I could just edit the models directly willy-nilly though. Maybe you can write tool that inspects your model definition and diffs it with the last version and generate a migration definition?

Anyway, better get some sleep. This is definitely an interesting idea and a very concrete one for a project.


Posted by airportyh 4 months ago about appengine, migration and programming (0 comments)

Maintaining Order in GAE

In a previous post I alluded to problems I encountered with using GAE's object model. This is the follow up to that.

In my todo list app, a main feature was to make reordering your todo list a breeze. When I wrote it with mysql, I used the standard relational database pattern to store an ordered list: create a field SEQ in the items table which holds the a sequence number(integer) that represents the order of the item in the list. But this means for every reordering I do, I have to update all active items in the list. I figured since I am only going to have small lists, it shouldn't matter, and it worked fine with mysql. When I did a port of this to GAE, reordering became terribly slow. A friend of mind told me that this is just a characteristic of object databases. My solution was to create a list field in the parent object(list), which remaps the sequence numbers into the correct order for the items. This way, updating the order only took saving the list object. Of course, this approach is more complicated. There are a couple of other possible approaches:
  1. As my friend mentioned: do a link list in the database. Instead of a SEQ field you would have a NEXT field which points to the item that's next to this one. This would make small reorderings(of the type: you take one object and move it to a different place in the list), a constant time operation, in terms of number of updates(3 updates total).
  2. Make the SEQ field a float, which would allow you to insert a item in between 2 other items with only one update. But because of numerical precision issues, sooner or later you would have to relabel the whole list for the SEQ numbers to be not too close together, which would be triggered like a garbage collection operation would be.
My current approach I think is about on par with these 2 in terms of complexity. I like the linked list solution because it's more scalable, you can really reorder/maintain arbitrarily large lists without missing a beat. For my app though, I only need to handle small lists, so I have no reason to switch for now
Posted by airportyh 4 months ago about appengine, googleappengine and programming (0 comments)

Web Dev with Google App Engine

I wrote my first Google App Engine app! It's located at justtodolist.appspot.com. It's yet another todo list - I have been a tadalist user for a while and thought I could make it slightly better, and so I did. Here is to jot down some thoughts on GAE.

First up, the things I like.
  1. The number one benefit of GAE for me is definitely one step deployment. No that you couldn't step up one step deployment for rails apps, but it just takes a lot of work to set this stuff up the first time. With GAE, it's one step deploy the first time. I would say it's easier even than php. A large part because of benefit number two
  2. there is no database to set up. As most people know by now, you use GQL/Big Table on GAE, and it is very different from relational databases. Setting up is really minimal. You specify your model in a DSL similar to what you'd write with SQLObject or Elixir, or Data Mapper for Ruby folks. And then boom! you are running.
  3. There's no user authentication to setup either, it's basically Gmail authentication, if you are willing to go along, that is. The User API is very simple, and you can start using it right away
  4. Development server is nice, it picks up changes immediately when you save any project file
  5. The number of projects files you have is very minimal, it's very non-cluttery.
So as you can see GAE is great for rapid prototyping. Now for things I am not that crazy about. Actually, most of the benefits I spoke of has some caveats:
  1. Although deployment is easy. Sometimes issues arise from the fact that things work slightly differently in production vs development. Such as, you need indices to build fully in production for the app to be ready to run, or for some reason, transaction rules work differently in production vs development(I haven't dug down to this fully, but it might be a bug)
  2. Big Table is cool, it's supposed to be super scalable, but there are a couple of things that are annoying about it. I can get over the fact that it's fundamentally different from relational databases: things like you can't do aggregate queries, joins and so forth. For performance too, some things you will just have to do differently than you would normally with relation databases. I am okay with that. What pains me is that there is no proper data migration path. When you change your models(add or remove fields, and so forth), the old stuff just stick around. To migrate the old data, you basically have to manually write a script that loops over the existing data structures and modify them, but the script has to be triggered from an http request just like everything else because that's the only way to run your code on GAE on the production server... getto! Also, I am extremely annoyed that while in development you can just clear your datastore. There is no analog in production. I realize though that this is a work in progress and that things will get better in the future.
  3. Well, the caveat for using Gmail authentication is that... your users must have a Gmail account, duh... I am sure you can use other authentication schemes if you want, I don't see anything preventing that
  4. Yes the development server is nice, but for some reason it was progressively running slower on my company Dell D820 laptop. This is so especially if you perform extensive modification of the data models.
Here are some other thoughs:
  1. The development console is not good, man! It's no where near the usability of the python interactive shell. I've heard of an alternative but haven't seen it yet.
  2. I haven't dug into how to do TDD with it yet, but I've read it's possible.
  3. Again, Big Table is VERY different from relational databases. I originally ported my todo list app from a SQLObject -> MySQL backend, with only 2 data models: TodoList and TodoItem. In Big Table we still have the 2, but they look kinda different. I had to change it because of performance reasons. I'll put the discussion of that on a separate post.
  4. I haven't got a great handle of how transaction/entity groups work. I thought I did, until my transaction code didn't work, will have to look closer into it. Documentation on this is kinda sparse. Right now my code is non-transactional.
Open Source GAE Apps?

Here's another thought. Will it be possible to popularize open source GAE apps? I think the most important reason for the popularity of open source php apps is the ease of deployment. With the ease of deployment of GAE, I think conditions might be ripe for there to emerge a movement of good open source GAE apps. Then again though, people might resist the vendor specific/non-open source nature of GAE. We will see.

Posted by airportyh 5 months ago about appengine, google, programming and python (0 comments)