Retrieve The Top N Tags in CouchDB

I am reading up on how map/reduce works in CouchDB, think I am getting the hang of it now, thanks to this great tool. I tried the "Retrieve the top N tags" example on this page. It didn't work for me at all. So I wrote my own as an exercise. Here's my code:

// the map function
function(doc){
    for(var i in doc.tags)
    {
        emit(null, doc.tags[i]);
    }
}

// the reduce function
function(key, values, rereduce){
    var hash = {}
    if (!rereduce){
        for (var i in values){
            var tag = values[i]
            hash[tag] = (hash[tag] || 0) + 1
        }
    }else{
        for (var i in values){
            var topN = values[i]
            for (var i in topN){
                var pair = topN[i]
                var tag = pair[0]
                hash[tag] = (hash[tag] || 0) + pair[1]
            }
        }
    }
    var all = []
    for (var key in hash)
        all.push([key, hash[key]])
    return all.sort(function(one, other){
        return other[1] - one[1]
    }).slice(0, 3)
}

The approach I took was different one from the one from the example page, but I believe it to be the more correct one: rather than returning the results keyed by the tag in the map step, I would emit every occurrence of every tag instead. Then in the reduce step, I would calculate the aggregation values grouped by tag using a hash, transform it into an array, sort it, and choose the top 3. For the rereduce case, I would combine a set of top 3 choices and then again pick the top 3 among them.

blog comments powered by Disqus