Map/Reduce Example

This example shows how to use the map_reduce() method to perform map/reduce style aggregations on your data.

Note

Map/Reduce requires server version >= 1.1.1. The PyMongo map_reduce() helper requires PyMongo version >= 1.2.

Setup

To start, we’ll insert some example data which we can perform map/reduce queries on:

>>> from pymongo import Connection
>>> db = Connection().map_reduce_example
>>> db.things.insert({"x": 1, "tags": ["dog", "cat"]})
ObjectId('...')
>>> db.things.insert({"x": 2, "tags": ["cat"]})
ObjectId('...')
>>> db.things.insert({"x": 3, "tags": ["mouse", "cat", "dog"]})
ObjectId('...')
>>> db.things.insert({"x": 4, "tags": []})
ObjectId('...')

Basic Map/Reduce

Now we’ll define our map and reduce functions. In this case we’re performing the same operation as in the MongoDB Map/Reduce documentation - counting the number of occurrences for each tag in the tags array, across the entire collection.

Our map function just emits a single (key, 1) pair for each tag in the array:

>>> from bson.code import Code
>>> map = Code("function () {"
...            "  this.tags.forEach(function(z) {"
...            "    emit(z, 1);"
...            "  });"
...            "}")

The reduce function sums over all of the emitted values for a given key:

>>> reduce = Code("function (key, values) {"
...               "  var total = 0;"
...               "  for (var i = 0; i < values.length; i++) {"
...               "    total += values[i];"
...               "  }"
...               "  return total;"
...               "}")

Note

We can’t just return values.length as the reduce function might be called iteratively on the results of other reduce steps.

Finally, we call map_reduce() and iterate over the result collection:

>>> result = db.things.map_reduce(map, reduce, "myresults")
>>> for doc in result.find():
...   print doc
...
{u'_id': u'cat', u'value': 3.0}
{u'_id': u'dog', u'value': 2.0}
{u'_id': u'mouse', u'value': 1.0}

Advanced Map/Reduce

PyMongo’s API supports all of the features of MongoDB’s map/reduce engine. One interesting feature is the ability to get more detailed results when desired, by passing full_response=True to map_reduce(). This returns the full response to the map/reduce command, rather than just the result collection:

>>> db.things.map_reduce(map, reduce, "myresults", full_response=True)
{u'counts': {u'input': 4, u'reduce': 2, u'emit': 6, u'output': 3}, u'timeMillis': ..., u'ok': ..., u'result': u'...'}

All of the optional map/reduce parameters are also supported, simply pass them as keyword arguments. In this example we use the query parameter to limit the documents that will be mapped over:

>>> result = db.things.map_reduce(map, reduce, "myresults", query={"x": {"$lt": 3}})
>>> for doc in result.find():
...   print doc
...
{u'_id': u'cat', u'value': 2.0}
{u'_id': u'dog', u'value': 1.0}

With MongoDB 1.8.0 or newer you can use SON to specify a different database to store the result collection:

>>> from bson.son import SON
>>> db.things.map_reduce(map, reduce, out=SON([("replace", "results"), ("db", "outdb")]), full_response=True)
{u'counts': {u'input': 4, u'reduce': 2, u'emit': 6, u'output': 3}, u'timeMillis': ..., u'ok': ..., u'result': {u'db': ..., u'collection': ...}}

See also

The full list of options for MongoDB’s map reduce engine

Table Of Contents

Previous topic

GridFS Example

Next topic

Geospatial Indexing Example

This Page