collection – Collection level operations

Collection level utilities for Mongo.

pymongo.ASCENDING = 1

Ascending sort order.

pymongo.DESCENDING = -1

Descending sort order.

pymongo.GEO2D = '2d'

Index specifier for a 2-dimensional geospatial index.

New in version 1.5.1.

Note

Geo-spatial indexing requires server version >= 1.3.3+.

class pymongo.collection.Collection(database, name[, create=False[, **kwargs]]])

Get / create a Mongo collection.

Raises TypeError if name is not an instance of basestring (str in python 3). Raises InvalidName if name is not a valid collection name. Any additional keyword arguments will be used as options passed to the create command. See create_collection() for valid options.

If create is True or additional keyword arguments are present a create command will be sent. Otherwise, a create command will not be sent and the collection will be created implicitly on first use.

Parameters :
  • database: the database to get a collection from
  • name: the name of the collection to get
  • create (optional): if True, force collection creation even without options being set
  • **kwargs (optional): additional keyword arguments will be passed as options for the create collection command

Changed in version 2.2: Removed deprecated argument: options

New in version 2.1: uuid_subtype attribute

Changed in version 1.5: deprecating options in favor of kwargs

New in version 1.5: the create parameter

See general MongoDB documentation

collections

c[name] || c.name

Get the name sub-collection of Collection c.

Raises InvalidName if an invalid collection name is used.

full_name

The full name of this Collection.

The full name is of the form database_name.collection_name.

Changed in version 1.3: full_name is now a property rather than a method.

name

The name of this Collection.

Changed in version 1.3: name is now a property rather than a method.

database

The Database that this Collection is a part of.

Changed in version 1.3: database is now a property rather than a method.

slave_okay

DEPRECATED. Use read_preference instead.

Changed in version 2.1: Deprecated slave_okay.

New in version 2.0.

read_preference

The read preference for this instance.

See ReadPreference for available options.

New in version 2.1.

safe

Use getlasterror with every write operation?

New in version 2.0.

uuid_subtype

The BSON binary subtype for a UUID used for this collection.

get_lasterror_options()

Returns a dict of the getlasterror options set on this instance.

New in version 2.0.

set_lasterror_options(**kwargs)

Set getlasterror options for this instance.

Valid options include j=<bool>, w=<int>, wtimeout=<int>, and fsync=<bool>. Implies safe=True.

Parameters :
  • **kwargs: Options should be passed as keyword

    arguments (e.g. w=2, fsync=True)

New in version 2.0.

unset_lasterror_options(*options)

Unset getlasterror options for this instance.

If no options are passed unsets all getlasterror options. This does not set safe to False.

Parameters :
  • *options: The list of options to unset.

New in version 2.0.

insert(doc_or_docs[, manipulate=True[, safe=False[, check_keys=True[, continue_on_error=False[, **kwargs]]]])

Insert a document(s) into this collection.

If manipulate is True, the document(s) are manipulated using any SONManipulator instances that have been added to this Database. In this case an "_id" will be added if the document(s) does not already contain one and the "id" (or list of "_id" values for more than one document) will be returned. If manipulate is False and the document(s) does not include an "_id" one will be added by the server. The server does not return the "_id" it created so None is returned.

If safe is True then the insert will be checked for errors, raising OperationFailure if one occurred. Safe inserts wait for a response from the database, while normal inserts do not.

Any additional keyword arguments imply safe=True, and will be used as options for the resultant getLastError command. For example, to wait for replication to 3 nodes, pass w=3.

Parameters :
  • doc_or_docs: a document or list of documents to be inserted
  • manipulate (optional): manipulate the documents before inserting?
  • safe (optional): check that the insert succeeded?
  • check_keys (optional): check if keys start with ‘$’ or contain ‘.’, raising InvalidName in either case
  • continue_on_error (optional): If True, the database will not stop processing a bulk insert if one fails (e.g. due to duplicate IDs). This makes bulk insert behave similarly to a series of single inserts, except lastError will be set if any insert fails, not just the last one. If multiple errors occur, only the most recent will be reported by error().
  • **kwargs (optional): any additional arguments imply safe=True, and will be used as options for the getLastError command

Note

continue_on_error requires server version >= 1.9.1

New in version 2.1: Support for continue_on_error.

New in version 1.8: Support for passing getLastError options as keyword arguments.

Changed in version 1.1: Bulk insert works with any iterable

See general MongoDB documentation

insert

save(to_save[, manipulate=True[, safe=False[, **kwargs]]])

Save a document in this collection.

If to_save already has an "_id" then an update() (upsert) operation is performed and any existing document with that "_id" is overwritten. Otherwise an insert() operation is performed. In this case if manipulate is True an "_id" will be added to to_save and this method returns the "_id" of the saved document. If manipulate is False the "_id" will be added by the server but this method will return None.

Raises TypeError if to_save is not an instance of dict. If safe is True then the save will be checked for errors, raising OperationFailure if one occurred. Safe inserts wait for a response from the database, while normal inserts do not.

Any additional keyword arguments imply safe=True, and will be used as options for the resultant getLastError command. For example, to wait for replication to 3 nodes, pass w=3.

Parameters :
  • to_save: the document to be saved
  • manipulate (optional): manipulate the document before saving it?
  • safe (optional): check that the save succeeded?
  • **kwargs (optional): any additional arguments imply safe=True, and will be used as options for the getLastError command

New in version 1.8: Support for passing getLastError options as keyword arguments.

See general MongoDB documentation

insert

update(spec, document[, upsert=False[, manipulate=False[, safe=False[, multi=False[, **kwargs]]]]])

Update a document(s) in this collection.

Raises TypeError if either spec or document is not an instance of dict or upsert is not an instance of bool. If safe is True then the update will be checked for errors, raising OperationFailure if one occurred. Safe updates require a response from the database, while normal updates do not - thus, setting safe to True will negatively impact performance.

There are many useful update modifiers which can be used when performing updates. For example, here we use the "$set" modifier to modify some fields in a matching document:

>>> db.test.insert({"x": "y", "a": "b"})
ObjectId('...')
>>> list(db.test.find())
[{u'a': u'b', u'x': u'y', u'_id': ObjectId('...')}]
>>> db.test.update({"x": "y"}, {"$set": {"a": "c"}})
>>> list(db.test.find())
[{u'a': u'c', u'x': u'y', u'_id': ObjectId('...')}]

If safe is True returns the response to the lastError command. Otherwise, returns None.

Any additional keyword arguments imply safe=True, and will be used as options for the resultant getLastError command. For example, to wait for replication to 3 nodes, pass w=3.

Parameters :
  • spec: a dict or SON instance specifying elements which must be present for a document to be updated
  • document: a dict or SON instance specifying the document to be used for the update or (in the case of an upsert) insert - see docs on MongoDB update modifiers
  • upsert (optional): perform an upsert if True
  • manipulate (optional): manipulate the document before updating? If True all instances of SONManipulator added to this Database will be applied to the document before performing the update.
  • safe (optional): check that the update succeeded?
  • multi (optional): update all documents that match spec, rather than just the first matching document. The default value for multi is currently False, but this might eventually change to True. It is recommended that you specify this argument explicitly for all update operations in order to prepare your code for that change.
  • **kwargs (optional): any additional arguments imply safe=True, and will be used as options for the getLastError command

New in version 1.8: Support for passing getLastError options as keyword arguments.

Changed in version 1.4: Return the response to lastError if safe is True.

New in version 1.1.1: The multi parameter.

See general MongoDB documentation

update

remove([spec_or_id=None[, safe=False[, **kwargs]]])

Remove a document(s) from this collection.

Warning

Calls to remove() should be performed with care, as removed data cannot be restored.

If safe is True then the remove operation will be checked for errors, raising OperationFailure if one occurred. Safe removes wait for a response from the database, while normal removes do not.

If spec_or_id is None, all documents in this collection will be removed. This is not equivalent to calling drop_collection(), however, as indexes will not be removed.

If safe is True returns the response to the lastError command. Otherwise, returns None.

Any additional keyword arguments imply safe=True, and will be used as options for the resultant getLastError command. For example, to wait for replication to 3 nodes, pass w=3.

Parameters :
  • spec_or_id (optional): a dictionary specifying the documents to be removed OR any other type specifying the value of "_id" for the document to be removed
  • safe (optional): check that the remove succeeded?
  • **kwargs (optional): any additional arguments imply safe=True, and will be used as options for the getLastError command

New in version 1.8: Support for passing getLastError options as keyword arguments.

Changed in version 1.7: Accept any type other than a dict instance for removal by "_id", not just ObjectId instances.

Changed in version 1.4: Return the response to lastError if safe is True.

Changed in version 1.2: The spec_or_id parameter is now optional. If it is not specified all documents in the collection will be removed.

New in version 1.1: The safe parameter.

See general MongoDB documentation

remove

drop()

Alias for drop_collection().

The following two calls are equivalent:

>>> db.foo.drop()
>>> db.drop_collection("foo")

New in version 1.8.

find([spec=None[, fields=None[, skip=0[, limit=0[, timeout=True[, snapshot=False[, tailable=False[, sort=None[, max_scan=None[, as_class=None[, slave_okay=False[, await_data=False[, partial=False[, manipulate=True[, read_preference=ReadPreference.PRIMARY[, **kwargs]]]]]]]]]]]]]]]])

Query the database.

The spec argument is a prototype document that all results must match. For example:

>>> db.test.find({"hello": "world"})

only matches documents that have a key “hello” with value “world”. Matches can have other keys in addition to “hello”. The fields argument is used to specify a subset of fields that should be included in the result documents. By limiting results to a certain subset of fields you can cut down on network traffic and decoding time.

Raises TypeError if any of the arguments are of improper type. Returns an instance of Cursor corresponding to this query.

Parameters :
  • spec (optional): a SON object specifying elements which must be present for a document to be included in the result set
  • fields (optional): a list of field names that should be returned in the result set (“_id” will always be included), or a dict specifying the fields to return
  • skip (optional): the number of documents to omit (from the start of the result set) when returning the results
  • limit (optional): the maximum number of results to return
  • timeout (optional): if True, any returned cursor will be subject to the normal timeout behavior of the mongod process. Otherwise, the returned cursor will never timeout at the server. Care should be taken to ensure that cursors with timeout turned off are properly closed.
  • snapshot (optional): if True, snapshot mode will be used for this query. Snapshot mode assures no duplicates are returned, or objects missed, which were present at both the start and end of the query’s execution. For details, see the snapshot documentation.
  • tailable (optional): the result of this find call will be a tailable cursor - tailable cursors aren’t closed when the last data is retrieved but are kept open and the cursors location marks the final document’s position. if more data is received iteration of the cursor will continue from the last document received. For details, see the tailable cursor documentation.
  • sort (optional): a list of (key, direction) pairs specifying the sort order for this query. See sort() for details.
  • max_scan (optional): limit the number of documents examined when performing the query
  • as_class (optional): class to use for documents in the query result (default is document_class)
  • slave_okay (optional): if True, allows this query to be run against a replica secondary.
  • await_data (optional): if True, the server will block for some extra time before returning, waiting for more data to return. Ignored if tailable is False.
  • partial (optional): if True, mongos will return partial results if some shards are down instead of returning an error.
  • manipulate: (optional): If True (the default), apply any outgoing SON manipulators before returning.
  • network_timeout (optional): specify a timeout to use for this query, which will override the Connection-level default
  • read_preference (optional): The read preference for this query.

Note

The manipulate parameter may default to False in a future release.

Note

The max_scan parameter requires server version >= 1.5.1

New in version 1.11+: The await_data, partial, and manipulate parameters.

New in version 1.8: The network_timeout parameter.

New in version 1.7: The sort, max_scan and as_class parameters.

Changed in version 1.7: The fields parameter can now be a dict or any iterable in addition to a list.

New in version 1.1: The tailable parameter.

See general MongoDB documentation

find

find_one([spec_or_id=None[, *args[, **kwargs]]])

Get a single document from the database.

All arguments to find() are also valid arguments for find_one(), although any limit argument will be ignored. Returns a single document, or None if no matching document is found.

Parameters :
  • spec_or_id (optional): a dictionary specifying the query to be performed OR any other type to be used as the value for a query for "_id".
  • *args (optional): any additional positional arguments are the same as the arguments to find().
  • **kwargs (optional): any additional keyword arguments are the same as the arguments to find().

Changed in version 1.7: Allow passing any of the arguments that are valid for find().

Changed in version 1.7: Accept any type other than a dict instance as an "_id" query, not just ObjectId instances.

count()

Get the number of documents in this collection.

To get the number of documents matching a specific query use pymongo.cursor.Cursor.count().

create_index(key_or_list, ttl=300, **kwargs)

Creates an index on this collection.

Takes either a single key or a list of (key, direction) pairs. The key(s) must be an instance of basestring (str in python 3), and the directions must be one of (ASCENDING, DESCENDING, GEO2D). Returns the name of the created index.

To create a single key index on the key 'mike' we just use a string argument:

>>> my_collection.create_index("mike")

For a compound index on 'mike' descending and 'eliot' ascending we need to use a list of tuples:

>>> my_collection.create_index([("mike", pymongo.DESCENDING),
...                             ("eliot", pymongo.ASCENDING)])

All optional index creation paramaters should be passed as keyword arguments to this method. Valid options include:

  • name: custom name to use for this index - if none is given, a name will be generated
  • unique: should this index guarantee uniqueness?
  • dropDups or drop_dups: should we drop duplicates
  • bucketSize or bucket_size: size of buckets for geoHaystack indexes during index creation when creating a unique index?
  • min: minimum value for keys in a GEO2D index
  • max: maximum value for keys in a GEO2D index
Parameters :
  • key_or_list: a single key or a list of (key, direction) pairs specifying the index to create
  • ttl (optional): time window (in seconds) during which this index will be recognized by subsequent calls to ensure_index() - see documentation for ensure_index() for details
  • **kwargs (optional): any additional index creation options (see the above list) should be passed as keyword arguments

Changed in version 2.2: Removed deprecated argument: deprecated_unique

Changed in version 1.5.1: Accept kwargs to support all index creation options.

New in version 1.5: The name parameter.

See also

ensure_index()

See general MongoDB documentation

indexes

ensure_index(key_or_list, ttl=300, **kwargs)

Ensures that an index exists on this collection.

Takes either a single key or a list of (key, direction) pairs. The key(s) must be an instance of basestring (str in python 3), and the direction(s) must be one of (ASCENDING, DESCENDING, GEO2D). See create_index() for a detailed example.

Unlike create_index(), which attempts to create an index unconditionally, ensure_index() takes advantage of some caching within the driver such that it only attempts to create indexes that might not already exist. When an index is created (or ensured) by PyMongo it is “remembered” for ttl seconds. Repeated calls to ensure_index() within that time limit will be lightweight - they will not attempt to actually create the index.

Care must be taken when the database is being accessed through multiple connections at once. If an index is created using PyMongo and then deleted using another connection any call to ensure_index() within the cache window will fail to re-create the missing index.

Returns the name of the created index if an index is actually created. Returns None if the index already exists.

All optional index creation paramaters should be passed as keyword arguments to this method. Valid options include:

  • name: custom name to use for this index - if none is given, a name will be generated
  • unique: should this index guarantee uniqueness?
  • dropDups or drop_dups: should we drop duplicates during index creation when creating a unique index?
  • background: if this index should be created in the background
  • min: minimum value for keys in a GEO2D index
  • max: maximum value for keys in a GEO2D index
Parameters :
  • key_or_list: a single key or a list of (key, direction) pairs specifying the index to create
  • ttl (optional): time window (in seconds) during which this index will be recognized by subsequent calls to ensure_index()
  • **kwargs (optional): any additional index creation options (see the above list) should be passed as keyword arguments

Changed in version 2.2: Removed deprecated argument: deprecated_unique

Changed in version 1.5.1: Accept kwargs to support all index creation options.

New in version 1.5: The name parameter.

See also

create_index()

drop_index(index_or_name)

Drops the specified index on this collection.

Can be used on non-existant collections or collections with no indexes. Raises OperationFailure on an error. index_or_name can be either an index name (as returned by create_index), or an index specifier (as passed to create_index). An index specifier should be a list of (key, direction) pairs. Raises TypeError if index is not an instance of (str, unicode, list).

Warning

if a custom name was used on index creation (by passing the name parameter to create_index() or ensure_index()) the index must be dropped by name.

Parameters :
  • index_or_name: index (or name of index) to drop
drop_indexes()

Drops all indexes on this collection.

Can be used on non-existant collections or collections with no indexes. Raises OperationFailure on an error.

reindex()

Rebuilds all indexes on this collection.

Warning

reindex blocks all other operations (indexes are built in the foreground) and will be slow for large collections.

New in version 1.11+.

index_information()

Get information on this collection’s indexes.

Returns a dictionary where the keys are index names (as returned by create_index()) and the values are dictionaries containing information about each index. The dictionary is guaranteed to contain at least a single key, "key" which is a list of (key, direction) pairs specifying the index (as passed to create_index()). It will also contain any other information in system.indexes, except for the "ns" and "name" keys, which are cleaned. Example output might look like this:

>>> db.test.ensure_index("x", unique=True)
u'x_1'
>>> db.test.index_information()
{u'_id_': {u'key': [(u'_id', 1)]},
 u'x_1': {u'unique': True, u'key': [(u'x', 1)]}}

Changed in version 1.7: The values in the resultant dictionary are now dictionaries themselves, whose "key" item contains the list that was the value in previous versions of PyMongo.

options()

Get the options set on this collection.

Returns a dictionary of options and their values - see create_collection() for more information on the possible options. Returns an empty dictionary if the collection has not been created yet.

group(key, condition, initial, reduce, finalize=None)

Perform a query similar to an SQL group by operation.

Returns an array of grouped items.

The key parameter can be:

  • None to use the entire document as a key.
  • A list of keys (each a basestring (str in python 3)) to group by.
  • A basestring (str in python 3), or Code instance containing a JavaScript function to be applied to each document, returning the key to group by.

With ReplicaSetConnection or MasterSlaveConnection, if the read_preference attribute of this instance is not set to pymongo.ReadPreference.PRIMARY or the (deprecated) slave_okay attribute of this instance is set to True the group command will be sent to a secondary or slave.

Parameters :
  • key: fields to group by (see above description)
  • condition: specification of rows to be considered (as a find() query specification)
  • initial: initial value of the aggregation counter object
  • reduce: aggregation function as a JavaScript string
  • finalize: function to be called on each object in output list.

Changed in version 2.2: Removed deprecated argument: command

Changed in version 1.4: The key argument can now be None or a JavaScript function, in addition to a list of keys.

Changed in version 1.3: The command argument now defaults to True and is deprecated.

rename(new_name, **kwargs)

Rename this collection.

If operating in auth mode, client must be authorized as an admin to perform this operation. Raises TypeError if new_name is not an instance of basestring (str in python 3). Raises InvalidName if new_name is not a valid collection name.

Parameters :
  • new_name: new name for this collection
  • **kwargs (optional): any additional rename options should be passed as keyword arguments (i.e. dropTarget=True)

New in version 1.7: support for accepting keyword arguments for rename options

distinct(key)

Get a list of distinct values for key among all documents in this collection.

Raises TypeError if key is not an instance of basestring (str in python 3).

To get the distinct values for a key in the result set of a query use distinct().

Parameters :
  • key: name of key for which we want to get the distinct values

Note

Requires server version >= 1.1.0

New in version 1.1.1.

map_reduce(map, reduce, out, full_response=False, **kwargs)

Perform a map/reduce operation on this collection.

If full_response is False (default) returns a Collection instance containing the results of the operation. Otherwise, returns the full response from the server to the map reduce command.

Parameters :
  • map: map function (as a JavaScript string)

  • reduce: reduce function (as a JavaScript string)

  • out: output collection name or out object (dict). See the map reduce command documentation for available options. Note: out options are order sensitive. SON can be used to specify multiple options. e.g. SON([(‘replace’, <collection name>), (‘db’, <database name>)])

  • full_response (optional): if True, return full response to this command - otherwise just return the result collection

  • **kwargs (optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method, e.g.:

    >>> db.test.map_reduce(map, reduce, "myresults", limit=2)
    

Note

Requires server version >= 1.1.1

Changed in version 2.2: Removed deprecated arguments: merge_output and reduce_output

Changed in version 1.11+: DEPRECATED The merge_output and reduce_output parameters.

New in version 1.2.

See general MongoDB documentation

mapreduce

inline_map_reduce(map, reduce, full_response=False, **kwargs)

Perform an inline map/reduce operation on this collection.

Perform the map/reduce operation on the server in RAM. A result collection is not created. The result set is returned as a list of documents.

If full_response is False (default) returns the result documents in a list. Otherwise, returns the full response from the server to the map reduce command.

With ReplicaSetConnection or MasterSlaveConnection, if the read_preference attribute of this instance is not set to pymongo.ReadPreference.PRIMARY or the (deprecated) slave_okay attribute of this instance is set to True the inline map reduce will be run on a secondary or slave.

Parameters :
  • map: map function (as a JavaScript string)

  • reduce: reduce function (as a JavaScript string)

  • full_response (optional): if True, return full response to this command - otherwise just return the result collection

  • **kwargs (optional): additional arguments to the map reduce command may be passed as keyword arguments to this helper method, e.g.:

    >>> db.test.inline_map_reduce(map, reduce, limit=2)
    

Note

Requires server version >= 1.7.4

New in version 1.10.

find_and_modify(query={}, update=None, upsert=False, **kwargs)

Update and return an object.

This is a thin wrapper around the findAndModify command. The positional arguments are designed to match the first three arguments to update() however most options should be passed as named parameters. Either update or remove arguments are required, all others are optional.

Returns either the object before or after modification based on new parameter. If no objects match the query and upsert is false, returns None. If upserting and new is false, returns {}.

Parameters :
  • query: filter for the update (default {})
  • sort: priority if multiple objects match (default {})
  • update: see second argument to update() (no default)
  • remove: remove rather than updating (default False)
  • new: return updated rather than original object (default False)
  • fields: see second argument to find() (default all)
  • upsert: insert if object doesn’t exist (default False)
  • **kwargs: any other options the findAndModify command supports can be passed here.

See general MongoDB documentation

findAndModify

Note

Requires server version >= 1.3.0

New in version 1.10.

Previous topic

database – Database level operations

Next topic

cursor – Tools for iterating over MongoDB query results

This Page