AnyDocArray - DocArray
Skip to content

AnyDocArray

docarray.array.any_array.AnyDocArray

Bases: Sequence[T_doc], Generic[T_doc], AbstractType

Source code in docarray/array/any_array.py

from_protobuf(pb_msg) abstractmethod classmethod

create a Document from a protobuf message

Source code in docarray/array/any_array.py
@classmethod
@abstractmethod
def from_protobuf(cls: Type[T], pb_msg: 'DocListProto') -> T:
    """create a Document from a protobuf message"""
    ...

summary()

Print a summary of this DocList object and a summary of the schema of its Document type.

Source code in docarray/array/any_array.py
def summary(self):
    """
    Print a summary of this [`DocList`][docarray.array.doc_list.doc_list.DocList] object and a summary of the schema of its
    Document type.
    """
    DocArraySummary(self).summary()

to_protobuf() abstractmethod

Convert DocList into a Protobuf message

Source code in docarray/array/any_array.py
@abstractmethod
def to_protobuf(self) -> 'DocListProto':
    """Convert DocList into a Protobuf message"""
    ...

traverse_flat(access_path) abstractmethod

Return a List of the accessed objects when applying the access_path. If this results in a nested list or list of DocLists, the list will be flattened on the first level. The access path is a string that consists of attribute names, concatenated and "__"-separated. It describes the path from the first level to an arbitrary one, e.g. 'content__image__url'.

from docarray import BaseDoc, DocList, Text


class Author(BaseDoc):
    name: str


class Book(BaseDoc):
    author: Author
    content: Text


docs = DocList[Book](
    Book(author=Author(name='Jenny'), content=Text(text=f'book_{i}'))
    for i in range(10)  # noqa: E501
)

books = docs.traverse_flat(access_path='content')  # list of 10 Text objs

authors = docs.traverse_flat(access_path='author__name')  # list of 10 strings

If the resulting list is a nested list, it will be flattened:

from docarray import BaseDoc, DocList


class Chapter(BaseDoc):
    content: str


class Book(BaseDoc):
    chapters: DocList[Chapter]


docs = DocList[Book](
    Book(chapters=DocList[Chapter]([Chapter(content='some_content') for _ in range(3)]))
    for _ in range(10)
)

chapters = docs.traverse_flat(access_path='chapters')  # list of 30 strings

If your DocList is in doc_vec mode, and you want to access a field of type AnyTensor, the doc_vec tensor will be returned instead of a list:

class Image(BaseDoc):
    tensor: TorchTensor[3, 224, 224]


batch = DocList[Image](
    [
        Image(
            tensor=torch.zeros(3, 224, 224),
        )
        for _ in range(2)
    ]
)

batch_stacked = batch.stack()
tensors = batch_stacked.traverse_flat(
    access_path='tensor'
)  # tensor of shape (2, 3, 224, 224)

Parameters:

Name Type Description Default
access_path str

a string that represents the access path ("__"-separated).

required

Returns:

Type Description
Union[List[Any], AbstractTensor]

list of the accessed objects, flattened if nested.

Source code in docarray/array/any_array.py