Skip to content

Adding a dataset Reader to dsprofile

Adding a reader for a new dataset type requires creating a subtype of the Reader abstract base class and implementing versions of the required methods which perform operations appropriate to that type.

The "format" (subclass_type_key) attribute

Each class derived from Reader must provide a class attribute named format containing a string serving as a tag for the dataset handled by that class. For example, the MyData type might have a format attribute named "mydata", as demonstrated below:

from dsprofile.lib.reader import Reader

class MyData(Reader):
    format = "mydata"
    ...

The value provided in format will be used as the command name that is provided to dsprofile on the command line to identify this dataset type. As such, it must not contain spaces or other characters which have significance to the shell.

Although the default name of this attribute is format, it may be changed via the Reader.subclass_type_key. Note that any such change affects all derived classes, including those which already exist.

Reader methods

<classmethod> build_subparser(cls, sp):

Receives an argparse subparser argument and is responsible for adding all type-specific command line arguments.

Source code in dsprofile/lib/reader.py
@classmethod
@abstractmethod
def build_subparser(cls, sp):
    """
      Receives an argparse subparser argument <sp> and is responsible
      for adding all type-specific command line arguments.
    """
    pass

This method is responsible for adding any type-specific options to the command-line argument parser.

In particular, it must add a subcommand key which identifies this Reader type when invoking dsprofile from the command-line. For example, to create a new subcommand for the MyData dataset reader whose format tag described above is "mydata", use the following:

parser = sp.add_parser(cls.format, help="Read datasets in mydata format")

This Reader type will then be available on the command-line with the "mydata" subcommand, for example:

$ dsprofile mydata /path/to/mydata.file

<classmethod> handle_args(cls, args) -> tuple[list, dict]:

Translates its argparse argument into the positional and keyword arguments required to create an instance of this type. The returned tuple must consist of two elements:

  1. A list (or other Sequence) of positional arguments
  2. A dict with str keys containing keyword arguments

These are subsequently passed to the type's constructor to create an instance.

Source code in dsprofile/lib/reader.py
@classmethod
@abstractmethod
def handle_args(cls, args) -> tuple[list, dict]:
    """
      Translates its argparse <args> argument into the positional
      and keyword arguments required to create an instance of this
      type.
      The returned tuple must consist of two elements:

        1. A list (or other `Sequence`) of positional arguments
        2. A dict with str keys containing keyword arguments

      These are subsequently passed to the type's constructor
      to create an instance.
    """
    pass

process(self) -> dict:

Processes the dataset and returns a type-specific dict containing the resulting metadata profile.

Source code in dsprofile/lib/reader.py
@abstractmethod
def process(self) -> dict:
    """
      Processes the dataset and returns a type-specific dict containing
      the resulting metadata profile.
    """
    pass

Resource ownership

It is important to note that Reader types are responsible for managing any resources such as file handles or remote access state that are used in reading their datasets.

The recommended way to manage such resources is using the weakref module's finalize method to register a handler appropriate to the type.