Data Integrator (Python API)
|
Class for joining two tables, each of them may not be sorted. More...
Public Member Functions | |
def | __init__ (s) |
def | HaveHeader (s, flag) |
Indicate if the files do have header data. More... | |
def | SetPermissiveness (s, p) |
Set the permissiveness model if invoked as a command line tool. More... | |
def | SetOutputFile (s, outFFN) |
Set the output file name. More... | |
def | SetJoinFile (s, ffn, jCol, tCol=[]) |
Set and read the file to be joined to the base file. More... | |
def | GetExitCode (s) |
Retrieve exit code for cmd line invocation according to permissiveness model. More... | |
def | Join (s, inFFN, jCol, noData=CELL_NO_DATA, idColEntry="", idColName="Source", joinUnpairable=False, summary=False, headerSuffix="") |
Join data to the base file. More... | |
Class for joining two tables, each of them may not be sorted.
Join two unsorted tables almost like the classic 'join' command. Additionally, a source tag column can be added, identifying the file where data was joined from. Only a subset of all columns may be transferred to the joined file. We slightly deviate from the 'join' command when it comes to equalness of files. There is a base file which data is added to from a join file. As the join file needs to reside in memory, we recommend to choose the smaller of the two files for this purpose.
def cls.TableJoin.CTableJoin.__init__ | ( | s | ) |
def cls.TableJoin.CTableJoin.GetExitCode | ( | s | ) |
Retrieve exit code for cmd line invocation according to permissiveness model.
int
. def cls.TableJoin.CTableJoin.HaveHeader | ( | s, | |
flag | |||
) |
Indicate if the files do have header data.
@param flag @c True if there are headers, else @c False
def cls.TableJoin.CTableJoin.Join | ( | s, | |
inFFN, | |||
jCol, | |||
noData = CELL_NO_DATA , |
|||
idColEntry = "" , |
|||
idColName = "Source" , |
|||
joinUnpairable = False , |
|||
summary = False , |
|||
headerSuffix = "" |
|||
) |
Join data to the base file.
Previously, the to be joined file has been set and read. In this call, we use its data to join with the base file given by parameter @c inFFN. It is possible to add a column which identifies the join data set. Unpaired lines from the base file can be printed, too. @param inFFN Full path file name for base file, '-' for stdin. @param jCol Column index with join key (counted from 0). @param noData Empty cell specifier. Defaults to that of the system. @param idColEntry A @c String which specifies the origin of the join file. Helpful when joining vertically, that is, multiple files with like data are joined in several independent steps and the final output file is the concatenation of all of them. @param idColName Column header name for the optional data origin column. @param joinUnpairable Prints unpaired lines from the base file. @param summary Do not pair but print the number of lines that would result upon pairing. Helpful when joining large files into even larger files with a lot of overlaps. @param headerSuffix Add this string to each header column name (for easier distinguishing in multiple joins). @return @c True if all lines were output and joining was successful. @c False if an error occurred, ie. the input file could not be opened or was empty. The exit code has been set accordingly.
def cls.TableJoin.CTableJoin.SetJoinFile | ( | s, | |
ffn, | |||
jCol, | |||
tCol = [] |
|||
) |
Set and read the file to be joined to the base file.
The method is very forgiving when it comes to missing columns or incorrect indexing. It implements the permissiveness model and still tries to continue when encoutering invalid input lines. Even a header line can be invalid and the program still tries to continue. In this case, the header however, will be erased and filled with 'N/A' values. @param ffn Full path file name of file to join. @param jCol Column number used for joining the files. Numbering starts from 0. @param tCol [optional] Add only these columns to the base file. Numbering starts with 0. @return @c True if file has successfully been read. @c False if the file could not be read.
def cls.TableJoin.CTableJoin.SetOutputFile | ( | s, | |
outFFN | |||
) |
Set the output file name.
@param outFFN Full path file name for base file, '-' for stdout.
def cls.TableJoin.CTableJoin.SetPermissiveness | ( | s, | |
p | |||
) |
Set the permissiveness model if invoked as a command line tool.
@param p Permissiveness model @c string. One of @c PERMISSIVENESS_ECHO, @c PERMISSIVENESS_SKIP, @c PERMISSIVENESS_STOP or @c None to disable application of the model.