readRawHeader() for TabularTextFile would
produce an obscure “Error in if (!isEmpty) { : argument is of length
zero” if the file is empty. Now it detects when the file is empty and
gives a more informative error message.dsApplyInPairs() is defunct. Use
future.apply::future_mapply() instead.extract() from the
R.rsp package.dsApply(), which has been defunct since version
2.13.0 (April 2019). Use future.apply::future_lapply()
instead.dsApplyInPairs() is deprecated. Use
future.apply::future_mapply() instead.
dsApply() is defunct. Use
future.apply::future_lapply() instead.
Removed defunct and hidden argument colClassPatterns
of readDataFrame() for TabularTextFile. Use
argument colClasses instead.
Removed defunct and hidden argument files of
extractMatrix() for GenericTabularFileSet. Use
extractMatrix(ds[files], ...) instead.
loadToEnv() for RDataFile was not declared
an S3 method.Package requires R (>= 3.2.0) released April 2015.
Package no longer requires Bioconductor.
dsApply() is now deprecated. Instead, use
future.apply::future_lapply().
dsApply() with
.parallel = "BiocParallel::BatchJobs" and
"BatchJobs" are now defunct. Instead, use
future.apply::future_lapply() with one of the many backends
that implements the Future API.
Now getChecksum() for ChecksumFile
defaults to not creating a checksum file (which is the default for other
types of file), but instead always return the checksum of the file by
only calculating and in memory. This prevents for instance the
equals() test on two different checksum files to generate
another set of checksum files on themselves.
Now findByName() for GenericDataFileSet
reports on the non-existing root paths in error messages.
GenericDataFile and GenericDataFileSet
no longer report on memory (RAM) usage of objects.
dsApply(..., .parallel = "future") now used
future_lapply() of the future package
internally. dsApply() will soon be deprecated (see
below).Argument colClassPatterns of
readDataFrame() for TabularTextFile is now
defunct. Use colClasses instead.
Argument files of extractMatrix() for
GenericTabularFileSet is defunct.
dsApply() with either
.parallel = "BiocParallel::BatchJobs" or
"BatchJobs" is deprecated. Instead, use
future::future_lapply() with whatever choice of
future::plan() preferred.
Defunct argument aliased of
getDefaultFullName() for GenericDataFile and
defunct argument alias of GenericDataFileSet()
have been removed.
Now file sizes are reported using IEC binary prefixes, i.e. bytes, KiB, MiB, GiB, TiB, …, YiB.
Added hasChecksumFile() for
GenericDataFile.
hasBeenModified() for GenericDataFile
gained argument update.
na.omit() for GenericDataFileSet;
the default one in the stats package works equally
well.Arguments$getTags() failed to drop missing
values.
equals(df, other) for GenericDataFile
would give an error if other was not a
GenericDataFile.
dropTags() would drop name if a tag had the same
name.
getOneFile() on a GenericDataFileSet
with a single missing file would give an error, now it gives a file with
an NA pathname.
Preparing to make the default pathname for
GenericDataFile() to become NA_character_. It
is currently NULL, but the goal is to enforce
length(pathname) to be one.
extractMatrix(ds, files, ...) for
GenericTabularFileSet is deprecated. Use
extractMatrix(ds[files], ...) instead.
dsApply(..., .parallel = "future"),
which utilizes the future package.Added support for sortBy(..., by = "mixedroman") for
GenericDataFileSet.
Now commentChar = "" and
commentChar = FALSE also disables searching for comment
characters (just as commentChar = NULL) for
TabularTextFile.
readDataFrame() for TabularTextFile with
column-names translators set, could give an error “Number of read data
columns does not match the number of column headers: …”. This was due to
an update in utils::read.table() as of R v3.2.1 svn rev
68831.lapply(),
dsApply() returns a list with names corresponds to the full
names of the data set.getFullNames(..., onRemapping) to
GenericDataFileSet to warn/err on full-name translations
that generates inconsistent fullname-to-index maps before and
after.linkTo(..., skip = TRUE) would give error “No
permission to modify existing file: …” also in the case when the proper
link already exists and there is no need to create a new one.
Now getReadArguments() for
TabularTextFile let duplicated named
colClasses entries override earlier ones,
e.g. colClasses=c("*" = NA, "*" = "NULL", a = "integer")
is effectively the same as
colClasses=c("*" = "NULL", a = "integer"). Added package
test.
nchar(..., type = "chars") is used
internally for all file and directory names (including tags).as.character() for GenericDataFile with a
missing (NA) pathname on recent R-devel (>= 2015-04-23) related to an
update on how nchar() handles missing values.Now [[ for GenericDataFileSet returns a
GenericDataFile not only if a numeric index is given but
also if a character string is given.
Now argument idx for getFile() for
GenericDataFileSet can also be a character string, in which
case the file returned is identified using
indexOf(..., pattern = idx, by = "exact", onMissing = "error").
Added RDataFile and RDataFileSet
classes for *.RData files.
requireNamespace() instead of
require() internally.as.character() for ChecksumFile gave an
error when the checksum files was missing.Added support for sortBy(..., by = "filesize") and
sortBy(..., decreasing = TRUE) for
GenericDataFileSet.
Added rep() for
GenericDataFileSet.
NOTES:
readDataFrame() would ignore argument
colClasses iff it had no names. Added package system test
for this case.commentChar = NULL for
TabularTextFile:s failed.readChecksums() for
ChecksumFileSet.byPath() for GenericDataFileSet would
output verbose message enumerating files loaded to stdout instead of
stderr.dsApply() for GenericDataFileSet would
coerce argument verbose to logical before applying the
function.sep for readDataFrame() would
only work for , and \t; now it works for any
separator.Now indexOf() first searched by exact names, then as
before, i.e. by regular expression and fixed pattern matching. Added
package system tests that contains particularly complicated cases for
this. This was triggered by a rare but real use case causing an error in
aroma.affymetrix. Thanks Benilton Carvalho for
reporting on this.
Added argument by to indexOf() for
GenericDataFileSet|List.
Added SuggestsNote field to DESCRIPTION with list of
packages that are recommended for the most common use cases.
Bumped package dependencies.
ds[[idx]] instead of
getFile(ds, idx) where possible.dsApply(..., .parallel = "none") would lower the
verbose threshold before applying the function resulting is less verbose
output in the non-parallel case.GenericDataFile would fail with linkTo() on
Windows systems without necessary privileges. Made the test less
conservative. Also, added an Rd section on privileges required on
Windows for linkTo() to work. Thanks to Brian Ripley for
reporting on this.NOTES:
readColumns() for TabularTextFile
handles also header-less files.copyTo() for GenericDataFileSet no longer
passes ... to byPath() when constructing the
return data set.renameTo() passes ... to
R.utils::renameFile() making it possible to also overwrite
existing files.Added is.na() for GenericDataFile and
GenericDataFileSet and na.omit() for the
latter, which already supports anyNA().
Added linkTo() for GenericDataFile,
which create a symbolic link at a given destination pathname analogously
to how copyTo() creates a file copy at a given destination
pathname.
copyTo() for GenericDataFile passes
... to R.utils::copyFile().copyTo() and renameTo() for
GenericDataFile had verbose output enabled by default.digest2() is now defunct.Added duplicated(), anyDuplicated(),
and unique() for GenericDataSet, which all
compare GenericDataFile:s using the equals()
method.
Now c() for GenericDataFileSet also
works to append GenericDataFile:s. Added package system
test for common use cases of c().
Added nbrOfColumns() for
GenericTabularFile, which, if the number of columns cannot
be inferred from the column names, will fall back to read the first row
of data and use that as the number of columns.
Now nbrOfColumns() for
ColumnNamesInterface returns NA if column names cannot be
inferred and hence not be counted.
Now readDataFrame(..., header = FALSE) works as
expected for tabular text files without headers.
Now getReadArguments() for
TabularTextFile returns a colClasses vector of
the correct length also in the case when there are no column
names.
loadRDS() available for plain files and
RdsFile:s.RdsFile and RdsFileSet objects for
handling *.rds file sets.GenericSummary.ChecksumFile and
ChecksumFileSet.extract() for GenericDataFileSet also
handles when the data set to be extracted is empty,
e.g. extract(GenericDataFileSet(), NA_integer_). Also,
added support for argument onMissing = "dropall", which
drops all files if one or more missing files where requested. Added
package system tests for these case.GenericDataFileSet$byPath(..., recursive = TRUE) would be
very slow setting up the individual files, especially for large data
sets. Now it’s only slow for the first file.Added "[["(x, i) for
GenericDataFileSet, which gets a
GenericDataFile by index i in
[1,length(x)]. When i is non-numeric, the next
"[["(x, i) method in the class hierarchy is used, e.g. the
one for Object:s.
Added gzip()/gunzip() for
GenericDataFileSet.
Added anyNA() to GenericDataFileSet to
test whether any of the pathnames are NA, or not.
getChecksum() on
GenericDataFile:s and
GenericDataFileSet:s.append() to become a
generic function does now call base::append() in the
default, instead of copy the latter. All this will eventually be
removed, when proper support for c, [,
[[, etc. has been added everywhere.getChecksum() from
R.cache instead of creating its own. This solves the
problem of the default getChecksum() of
R.cache not being found.readDataFrame() for
TabularTextFile subsets by row, before reparsing numerical
columns that were quoted.autoload():s used internally.Deprecated digest2() and deprecated -> defunct
-> dropped.
Now GenericDataFileSet() gives an error informing
that argument alias is defunct.
Now no generic functions are created for defunct methods.
R.filesets Package object is also
available when the package is only loaded (but not attached).cat() from
R.utils.SPEEDUP: Package no longer uses
R.utils::whichVector(), which use to be 10x faster, but
since R 2.11.0 which() is 3x times again.
Package no longer utilizes import(), only
importFrom():s.
WORKAROUND: For now, package attaches the R.oo
package. This is needed due to what appears to be a bug in how
R.oo finalizes Object:s assuming
R.oo is/can be attached. Until that is resolved, we
make sure R.oo is attached.
Forgot to import
R.methodsS3::appendVarArgs().
[() and c() for
GenericDataFileSet.private = FALSE to byPath()
of GenericDataFileSet.isGzipped() ignores the case of the filename
extension when testing whether the file is gzipped or not.rm() calls with NULL
assignments.digest2(), which soon will
be deprecated.\usage{} lines are at most 90
characters long.In addition to a fixed integer, argument skip for
readDataFrame() (default and for
TabularTextFile) may also specify a regular expression
matching the first row of the data section.
Now argument skip to TabularTextFile
and readDataFrame() for that class causes the parser to
skip that many lines including commented lines, whereas before it did
not count commented lines.
Added a default readDataFrame() for reading data
from one or more tabular text files via the
TabularTextFile/TabularTextFileSet
classes.
colClassPatterns of
readDataFrame() for TabularTextFile has been
renamed to colClasses.startupMessage() of
R.oo.indexOf() for
GenericDataFileSet throws an exception if user tries to
pass an argument names.Added head() and tail() for
GenericTabularFile.
Added subsetting via [() to
GenericTabularFile.
nbrOfRows() for TabularTextFile forgot
to exclude comment rows in the file header.
readColumns() for GenericTabularFile would not
preserve the order of the requested columns.
Added getOneFile() for
GenericDataFileSet, which returns the first
GenericDataFile with a non-missing pathname.
Added argument absolute = FALSE to
getPathname() for GenericDataFile.
GenericDataFile stores the absolute
pathname of the file, even if a relative pathname is given. This makes
sure that the file is found also when the working directory is
changed.equals() for GenericDataFileSet would only
compare the first GenericDataFile in each set.isGzipped() to GenericDataFile.writeColumnsToFiles() to
GenericTabularFile. Used to be available only for
TabularTextFile.getDefaultColumnNames() for
TabularTextFile did not use columnNames if it
was set when creating the TabularTextFile object.
Now getReadArguments() for
TabularTextFile drops arguments that are NULL, because they
could cause errors downstreams, e.g. readDataFrame()
calling read.table(..., colClasses = NULL) =>
rep_len(NULL, x) => “Error in rep_len(colClasses, cols)
: cannot replicate NULL to a non-zero length”.
as.list() for GenericDataSet to
return a named list of GenericDataFile:s
(previously it had no names). The names are the (translated) full names
of the GenericDataFile:s.lapply() and sapply() for
GenericDataSet, because the corresponding functions in the
base package utilizes as.list().Now GenericDataFile() retrieves the file time stamps
such that hasBeenModified() returns a correct value also
when first called, and not only TRUE just in case. This has the effect
that getChecksum() will detected cached results already at
the second call as long as the file has to been modified. Previously it
took two calls to getChecksum() for it to be properly
cached.
Now declaring more internal and temporary Object
fields as “cached”, which means they will be cleared if
clearCache() or gc() is called on the
corresponding object.
Added further verbose output to
TabularTextFileSet.
DOCUMENTATION: Minor corrections to help pages.
NOTES:
TabularTextFile to ignore header comment arguments when
inferring column names and classes.clearCache() for GenericDataFileSet
relies on ditto of Object to clear all cached fields (=
with field modifier "cached").{get,set}Label() for
GenericDataFile and {get,set}Alias() for
GenericData{File,FileSet}. Related arguments such at
alias to GenericDataFileSet and
aliased to getDefaultFullName() for
GenericDataFile are also deprecated.seq_along(x) instead of
seq(along = x) everywhere. Similarly, seq(ds)
where ds is GenericDataFileSet is now replaced
by seq_along(ds). Likewise, seq_len(x)
replaces seq(length = x), and length(ds)
replaces nbrOfFiles(ds).Now TabularTextFile() tries to infer whether the
data section contains column names or not. This is done by comparing to
the optional columnNames header argument. If that is not
available, it will (as before) assume there are column names.
Now readDataFrame() acknowledge header comment
arguments columnNames and columnClasses if
specified in the file.
Now getDefaultColumnNames() for
TabularTextFile falls back to header comment argument
columnNames, if there are no column names in the actual
data table.
Now readRawHeader() for TabularTextFile
also parses and returns header comment arguments.
ColumnNamesInterface which
GenericTabularFile now implements. Classes inheriting from
GenericTabularFile should rename any
getColumnNames() method to
getDefaultColumnNames().whichVector() with which(),
because the latter is now the fastest again.setColumnNames() for
GenericTabularFile, which utilizes
setColumnNamesTranslator().{get,set}ColumnNameTranslator() in favor of
{get,set}ColumnNamesTranslator(); note the plural
form.readDataFrame() for TabularTextFile no
longer returns attribute fileHeader, unless argument
debug is TRUE.validate() to GenericDataFileSet,
which iteratively calls validate() on all the
GenericDataFile:s in the set. The default is to return NA,
indicating that no validation was done.Arguments$getReadablePath() instead of
filePath(..., expandLinks = "any").Arguments$getFilename() below.... to
NextMethod(), cf. R-devel thread ‘Do not pass’…’
to NextMethod() - it’ll do it for you; missing documentation, a bug or
just me?’ on Oct 16, 2012.Arguments$getFilename() from this package to
R.utils v1.17.0.fromFiles() for GenericDataFileSet
is now defunct in place for byName(), which has been
recommended since January
Now readDataFrame() for TabularTextFile
defaults to read strings as characters rather than as factors. To read
strings as factors, just pass argument
stringsAsFactors = TRUE.
Added readDataFrame() for
TabularTextFileSet.
ROBUSTNESS: Now getHeader() for
TabularTextFile checks if the file has been modified before
returned cached results.
trim()
being overridden by ditto from the IRanges package, iff
loaded.extractMatrix() for
GenericTabularFile adds column names just as ditto for
GenericTabularFileSet does..Internal() calls.GenericDataFile and
GenericDataFileSet handle so called “empty” files, which
are files with NULL pathnames.getCommentChar() to
TabularTextFile and argument commentChar to
its constructor. This allows to use custom comment characters other than
just "#".GenericDataFileSet$byName(..., subdirs) would throw
Error in strsplit(subdirs, split = "/\\") iff
subdirs != NULL.
Improved the handling of the newly introduced depth
parameter, e.g. by making it optional/backward compatible.
GenericDataFileSet, such that one can
correctly infer fullname and subdirs from the path.named to getTags() for
FullNameInterface. If TRUE, tags of format
"<name>=<value>" will be parsed and returned as
a named "<value>",
e.g. "foo,n=23,bar,n=42" is returned as
c("foo", "n"="23", "bar", "n"="42").readDataFrame(..., colClasses = ..., trimQuotes = TRUE)
of TabularTextFile will read numeric columns that are
quoted. This is done by first reading them as quoted character strings,
dropping the quotes, and then rereading them as numeric values..fileClass to appendFiles()
for GenericDataFileSet.ROBUSTNESS: Now appendFiles() for
GenericDataFileSet asserts that all files to be appended
are instances of the file class of this set as given by the static
getFileClass().
ROBUSTNESS: Added argument .assertSameClass to
appendFiles() for GenericDataFileSet, which if
TRUE asserts that the files to be appended inherits from the same class
as the existing files. Before this test was mandatory.
getChecksum() to GenericDataFileSet,
which calculates the checksum of the object returned by the protected
getChecksumData(). Use with care, because what objects
should be the basis of the checksum is not clear, e.g. should it be only
the file system checksum, or should things such as translated fullnames
be included as well?equals() for GenericDataFile would
consider two files not to be equal only if their checksums was equal,
and vice verse. Also, when creating the message string explaining why
they differ an error would have been thrown.hpaste() internally wherever applicable.appendFullNameTranslatorBy<what>() for
<character> and <function> assert
that the translator correctly returns exactly one string. This has the
effect that setFullName() and friends are also tested.Added = to the list of safe characters for
Arguments$getFilename().
Added fullname(), name(),
tags(), and dropTags().
findByName() for
GenericDataFileSet it would throw “<simpleError in
paths[sapply(rootPaths, FUN = isDirectory)]: invalid subscript type
‘list’>” in case no matching root path directories existed.Added dropRootPathTags().
GENERALIZATION: Added support to findByName() for
GenericDataFileSet such that root paths also can be
specified by simple regular expression (still via argument
paths). Currently it is only the last subdirectory that can
be expanded, e.g. foo/bar/data(,.*)/.
GENERALIZATION: Now byName() for
GenericDataFileSet will try all possible data set
directories located when trying to setup a data set. Before it only
tried the first one located. This new approach is equally fast for the
first data set directory as before. The advantage is that it adds
further flexibilities, e.g. the first directory may not be what we want
but the second, which can be further tested by the byPath()
and downstream methods such as the constructor.
ROBUSTNESS: Now writeColumnsToFiles() for
TabularTextFile writes files atomically, which should
minimize the risk for generating incomplete files.
getTags() for Arguments from
aroma.core package.fromFiles() of
GenericDataFileSet has been deprecated, if still called by
someone.GENERALIZATION: Now append() for
GenericDataFileSet tries to also append
non-GenericDataFileSet object by passing them down to
appendFiles() assuming they are
GenericDataFile:s.
GENERALIZATION: Now appendFiles() for
GenericDataFileSet also accepts a single item. Thus, there
is no longer a need to wrap up single items in a list.
ROBUSTNESS: Now GenericDataFileSet$byName() asserts
that arguments name and tags contain only
valid characters. This will for instance prevent passing paths or
pathnames by mistake.
Now appendFullNameTranslator(..., df) for
FullNameInterface takes either pattern or
fixed translations in data.frame.
Added sortBy() to GenericDataFileSet,
which sorts files either in a lexicographic or a mixedsort
order.
DOCUMENTATION: Added more Rd help pages.
DOCUMENTATION: Removed any duplicated \usage{}
statements from the Rd documentation.
indexOf() for GenericDataFileSet/List
would return NA if the search pattern/string contained parentheses. The
reason is that such have a special meaning in regular expression. Now
indexOf() first search by regular expression patterns, then
by fixed strings. Thanks Johan Staaf at Lund University and Larry(?) for
reporting on this issue.Now
GenericDataFileSet$findByName(..., mustExist = FALSE) do no
longer throw an exception even if there is no existing root
path.
Added argument firstOnly = TRUE to
findByName() for GenericDataFileSet.
Added appendFullNameTranslatorBy...() methods to the
FullNameInterface class for data frames,
TabularTextFile:s, and
TabularTextFileSet:s.
"NA" to the default na.strings
returned by getReadArguments() for
TabularTextFile.NOTES:
.onUnknownArgs to
GenericDataFile() and GenericDataFileSet(). As
before, the default is to throw an exception if there are unknown
arguments. However, in certain case it is useful to allow (and ignore)
“stray” arguments.indexOf() of GenericDataFileSet and
GenericDataFileSetList did not handle names with regular
expression symbols + and *. Thanks to Randy
Gobbel for the initial error report.GenericDataFile and GenericDataFileSet.fromFiles() of
GenericDataSet. Use byPath() instead.files is logical, then
extract() of GenericDataFileSet and
GenericDataFileSetList now asserts that the length of
files matches the number of available files.exData/.readColumns(..., column=<string>) on a
TabularTextFile would give “Error … object ‘columnNames’
not found”.default = "\\.([^.]+)$" to
getExtensionPattern() of GenericDataFile.
Before the default value was hard coded inside this function.setExtensionPattern(..., pattern = NULL) of
GenericDataFile works.Added protected as.data.frame() to
GenericDataFileSetList.
Now GenericDataFile(NA, mustExist = FALSE) is a
valid object. Made all methods aware of such missing files.
Now extract(ds, c(1, 2, NA, 4), onMissing = "NA")
returns a valid GenericDataFileSet where missing files are
returned as missing GenericDataFile:s.
Added na.rm = TRUE to all getTags() so
that it returns NULL in case the file is missing.
copyTo() of GenericDataFileSet quietly
ignores missing files.
Added Rd help for indexOf() of
GenericDataFileSet.
ROBUSTNESS: Using new Arguments$getInstanceOf() were
possible.
Now all index arguments are validated correctly using the new
max argument of Arguments$getIndices(). Before
the case where max == 0 was not handled
correctly.
Changed the default to parent = 0 for
getDefaultFullName() of GenericDataFileSet to
be consistent with the documentation.
Now GenericDataFile(pathname) throws an error if
pathname is referring to a directory.
getPath() and getDefaultFullName() of
GenericDataFileSet would return a logical instead
of character value.
indexOf(ds, names) of
GenericDataFileSet would return a logical instead
of an integer vector of NA:s if none of the names
existed.
translateFullName() of
FullNameInterface and translateColumnNames()
of GenericTabularFile throw an exception if some fullnames
were translated into NA. They also assert that no names were dropped or
added in the process.After doing append() to a
GenericDataFileSet, the total file size reported would
remain the same.
Appending empty data sets using append() of
GenericDataFileSet would give error ‘Error in
this$files[[1]] : subscript out of bounds’.
Added {get,set}ExtensionPattern() to
FullNameInterface.
Added getExtension() to
GenericDataFile.
appendFullNameTranslatorBylist() which makes it
possible to do setup a sequence of fullnames translators
fnt1, fnt2, fnt3 by calling
setFullNameTranslator(..., list(fnt1, fnt2, fnt3)).Added support for having a sequence of fullname translator
functions. These can be added using
appendFullNameTranslator().
Added an example() to
FullNameInterface.
[() to
TabularTextFile.Added the FullNameInterface, which is the interface
class that defines what fullnames, names, tags etc are.
Now setFullName*s*Translator() for
GenericDataFileSet dispatches on the by
argument. If that is not possible, it call
setFullNameTranslator() for each file in the set (as
before).
GenericDataFile and GenericDataFileSet
implements the FullNameInterface, which mean less redundant
code.fromFiles() to byPath().
For backward compatibility the former calls the latter.findByName() of GenericDataFileSet
follows Windows Shortcut links also for subdirectories.Analogously to the method for a GenericDataFile, the
setFullNameTranslator() method for
GenericDataFileSet now assumes that the fullname translator
function accepts also argument set.
Added argument .fileSetClass to
GenericDataFileSet().
GenericDataFile
should accept any number of arguments. The first argument will always be
(an unnamed) argument containing the name (or names) to be translated.
If the translator is for a GenericDataFile, an additional
argument file will also be passed. This allows the
translator function to for instance read the file header and infer the
name that way.Extracted several classes and methods from the aroma.core package.
Created package.