EIYBrowse.filetypes.my5c_folder module

The my5c_folder module contains classes for working with individual my5c output files, one per chromosome, arranged in folders. Each file divides the chromosome into a number of bins, and the interaction between all pairs of bins is represented as a tab-delimited matrix.

These folders of files can be used as input to the InteractionsTrack class.

class EIYBrowse.filetypes.my5c_folder.My5CFolder(folder_path, file_class=<class 'EIYBrowse.filetypes.my5c_folder.My5cFile'>)

Bases: object

The My5CFolder class provides an interface to a folder of my5c files.

the interactions() method is called with a pybedtools.Interval object, and the chrom attribute is used to identify the my5c file in the folder containing the relevant interactions. The My5cFile object corresponding to that chromosome is created and the pybedtools.Interval is passed to it’s My5cFile.interactions() method.

Create a new My5CFolder object.

Parameters:
  • folder_path (str) – Path to the folder containing the my5c files.
  • file_class (class) – Class to use for opening the returned file.
find_chrom_file(chrom)

Find the path to the my5c file containing the data for the given chromosome.

Parameters:

chrom (str) – Name of the chromosome to find.

Returns:

Path to the located my5cfile.

Raises:
  • TooManyFilesError – If more than one file is found matching the given chromosome name.
  • NoFilesError – If no file is found matching the given chromosome name.
get_my5c_file(chrom)

Return the filetype object holding the data for the given chromosome.

We first call find_chrom_file() to determine the path to the data file, then pass this path to whichever class is defined by the file_class attribute (which defaults to My5cFile).

Parameters:chrom (str) – Chromosome to find data for.
Returns:Object of the class specified by the file_class attribute.
interactions(region)

Return the interactions inside the specified region.

First use the chrom attribute of the region to get a file object representing all the interactions for that chromosome. Then call the object’s get_interactions method to obtain a numpy array of the interactions within the specified region.

Parameters:region (pybedtools.Interval) – Genomic region to convert to an index
Returns:numpy array containing the interaction data, and a pybedtools.Interval object giving the genomic co-ordinates of the returned array.
class EIYBrowse.filetypes.my5c_folder.My5cFile(file_path)

Bases: object

The My5cFile class handles extraction of interactions from an individual my5c file.

The data is stored in the interactions attribute as a 2d numpy array, and must therefore be accessed by the index of the array cell. For example, if the data is at 50kb resolution, the region from 500kb to 550kb corresponds to the 11th cell, which has the index of 10. To obtain the interaction between an object at 530kb and one at 590kb, you could therefore directly call my5cfile.interactions[10,11].

Of course we want to obtain the interactions between different regions specified in genomic co-ordinates (i.e. base pairs). Most of the logic in this class is related to this conversion.

To obtain the set of interactions for a region in genomic co-ordinates the get_interactions() method is called with a pybedtools.Interval object. index_from_interval() is called to convert the interval into the correct index for the internal numpy array, and the corresponding cells of the array are returned.

Create a new My5cFile object. Stores the interactions data as a numpy array in the interactions attribute, and stores the genomic location for each bin in the windows attribute.

Parameters:file_path (str) – Path to the my5c file containing interaction data.
get_interactions(region)

Get the interactions within a given genomic region.

Parameters:region (pybedtools.Interval) – Genomic region to convert to an index
Returns:numpy array containing the interaction data, and a pybedtools.Interval object giving the genomic co-ordinates of the returned array.
index_from_interval(region)

Convert a pybedtools.Interval object into a start and stop index for the internal numpy array.

We select all the bins overlapping the region from the windows DataFrame by searching for bins whose stop co-ordinate is larger than the start co-ordinate of the interval and whose start co-ordinate is less than the stop co-ordinate of the interval. We then return the index of the first covered window, and the last covered window + 1 (as slicing the numpy array will return up to but not including the last index).

Parameters:region (pybedtools.Interval) – Genomic region to convert to an index
Returns:Start and stop array indices as integers.
indices_to_interval(start, stop)

Return the genomic co-ordinates of the interactions returned.

Since this class will return a numpy array of interactions, the start and stop co-ordinates of the array may not exactly match with the region requested (for example, if interactions are required for the region from 34kb to 456kb from a matrix with 10kb resolution, the returned results will span from 30kb to 460kb.

In order to adjust the size of the returned array to match the boundaries of the plotting window, the InteractionsTrack must be given the exact start and stop of the region returned from the array. This method finds these values for a given pair of indices.

Parameters:
  • start (int) – Starting index of the interaction array.
  • stop (int) – Ending index of the interaction array.
Returns:

Genomic region spanned by the given slice of the internal

numpy array.

EIYBrowse.filetypes.my5c_folder.format_window(window)

Given a my5c style location specifier, return the name of the chromosome and the genomic start and stop.

Parameters:window (str) – Genomic location in my5c format, e.g. HiC|mm9|chr7:7000000-7999999
Returns:chromosome name, start position in bp, stop position in bp.
EIYBrowse.filetypes.my5c_folder.format_windows(windows)

Given an iterator of my5c style location specifiers, return a pandas.multiindex with the chromosome name, genomic start and genomic stop positions as the levels. :param list windows: List (or other iterator) of my5c style

location specifiers.
Returns:pandas.multiindex