Coretex
coretex.entities.dataset.sequence_dataset.sequence_dataset.SequenceDataset Class Reference
Inheritance diagram for coretex.entities.dataset.sequence_dataset.sequence_dataset.SequenceDataset:
coretex.entities.dataset.network_dataset.NetworkDataset coretex.entities.dataset.dataset.Dataset

Public Member Functions

Optional[Self] createSequenceDataset (cls, str name, int projectId, Union[Path, str] metadataPath, Optional[Dict[str, Any]] meta=None)
 
None download (self, bool decrypt=True, bool ignoreCache=False)
 
bool isPairedEnd (self)
 
- Public Member Functions inherited from coretex.entities.dataset.network_dataset.NetworkDataset
Path path (self)
 
Self fetchCachedDataset (cls, List[str] dependencies)
 
Self createDataset (cls, str name, int projectId, Optional[Dict[str, Any]] meta=None)
 
str generateCacheName (cls, str prefix, List[str] dependencies)
 
Self createCacheDataset (cls, str prefix, List[str] dependencies, int projectId)
 
bool finalize (self)
 
bool rename (self, str name)
 
SampleType add (self, Union[Path, str] samplePath, Optional[str] sampleName=None, **Any metadata)
 
- Public Member Functions inherited from coretex.entities.dataset.dataset.Dataset
int count (self)
 
Optional[SampleType] getSample (self, str name)
 

Detailed Description

    Sequence Dataset class which is used for Datasets whose
    samples contain sequence data (.fasta, .fastq)

Definition at line 33 of file sequence_dataset.py.

Member Function Documentation

◆ createSequenceDataset()

Optional[Self] coretex.entities.dataset.sequence_dataset.sequence_dataset.SequenceDataset.createSequenceDataset (   cls,
str  name,
int  projectId,
Union[Path, str]  metadataPath,
Optional[Dict[str, Any]]   meta = None 
)
    Creates a new sequence dataset with the provided name and metadata

    Parameters
    ----------
    name : str
        dataset name
    projectId : int
        project for which the dataset will be created
    metadataPath : Union[Path, str]
        path the zipped metadata file

    Returns
    -------
    The created sequence dataset object or None if creation failed

    Example
    -------
    >>> from coretex import SequenceDataset
    \b
    >>> dummyDataset = SequenceDataset.createSequenceDataset("dummyDataset", 123, pathToMetadata)
    >>> if dummyDataset is not None:
            print("Dataset created successfully")

Definition at line 64 of file sequence_dataset.py.

70  ) -> Optional[Self]:
71 
72  """
73  Creates a new sequence dataset with the provided name and metadata
74 
75  Parameters
76  ----------
77  name : str
78  dataset name
79  projectId : int
80  project for which the dataset will be created
81  metadataPath : Union[Path, str]
82  path the zipped metadata file
83 
84  Returns
85  -------
86  The created sequence dataset object or None if creation failed
87 
88  Example
89  -------
90  >>> from coretex import SequenceDataset
91  \b
92  >>> dummyDataset = SequenceDataset.createSequenceDataset("dummyDataset", 123, pathToMetadata)
93  >>> if dummyDataset is not None:
94  print("Dataset created successfully")
95  """
96 
97  if isinstance(metadataPath, str):
98  metadataPath = Path(metadataPath)
99 
100  dataset = cls.createDataset(name, projectId, meta)
101 
102  if dataset.isEncrypted:
103  dataset.metadata = _encryptedSampleImport(CustomSample, "_metadata", metadataPath, dataset.id, getProjectKey(dataset.projectId))
104  else:
105  dataset.metadata = _chunkSampleImport(CustomSample, "_metadata", metadataPath, dataset.id)
106 
107  return dataset
108 

◆ download()

None coretex.entities.dataset.sequence_dataset.sequence_dataset.SequenceDataset.download (   self,
bool   decrypt = True,
bool   ignoreCache = False 
)
    Downloads dataset from Coretex

    Parameters
    ----------
    ignoreCache : bool
        if dataset is already downloaded and ignoreCache
        is True it will be downloaded again (not required)

    Example
    -------
    >>> from coretex import NetworkDataset
    \b
    >>> dummyDataset = NetworkDataset.fetchById(1023)
    >>> dummyDataset.download()

Reimplemented from coretex.entities.dataset.network_dataset.NetworkDataset.

Definition at line 109 of file sequence_dataset.py.

109  def download(self, decrypt: bool = True, ignoreCache: bool = False) -> None:
110  super().download(decrypt, ignoreCache)
111 
112  self.metadata.download(decrypt, ignoreCache)
113 

◆ isPairedEnd()

bool coretex.entities.dataset.sequence_dataset.sequence_dataset.SequenceDataset.isPairedEnd (   self)
    This function returns True if the dataset holds paired-end reads and
    False if it holds single end. Files for paired-end reads must contain
    "_R1_" and "_R2_" in their names, otherwise an exception will be raised.
    If the sample contains gzip compressed sequences, you will have to call
    Sample.unzip method first otherwise calling Sample.isPairedEnd will
    raise an exception

    Raises
    ------
    FileNotFoundError -> if no files meeting the requirements for either single-end
        or paired-end sequencing reads
    ValueError -> if dataset has a combination of single-end and paired-end samples

Definition at line 114 of file sequence_dataset.py.

114  def isPairedEnd(self) -> bool:
115  """
116  This function returns True if the dataset holds paired-end reads and
117  False if it holds single end. Files for paired-end reads must contain
118  "_R1_" and "_R2_" in their names, otherwise an exception will be raised.
119  If the sample contains gzip compressed sequences, you will have to call
120  Sample.unzip method first otherwise calling Sample.isPairedEnd will
121  raise an exception
122 
123  Raises
124  ------
125  FileNotFoundError -> if no files meeting the requirements for either single-end
126  or paired-end sequencing reads
127  ValueError -> if dataset has a combination of single-end and paired-end samples
128  """
129 
130  pairedEndSamples = [sample.isPairedEnd() for sample in self.samples]
131 
132  if all(pairedEndSamples):
133  return True
134 
135  if not any(pairedEndSamples):
136  return False
137 
138  raise ValueError(">> [Coretex] Dataset contains a mix of paired-end and single-end sequences. It should contain either one or the other")
139 

The documentation for this class was generated from the following file: