Library - developer documentation
Here you can find all that you need if you want to continue the development of the Validator library or to use it more advanced!
Don't forget to check out the user documentation to get a grasp of what this library is all about!
Generated code documentation
Firstly there is documentation generated from the source code where you can find all the classes you will be working with.
General architecture
Here you can see the general architecture design of the application:
This design was created in the beginning and tries to reflect single responsibility principle
as all of the modules should be isolated and the Controllers
orchestrates the communication between them. One exception are the components Tabular Data Validator
and Tabular Data Parser
which communicate between each other because we wanted to reduce the time consumed by the intermediary communication through the Controller
as the tabular data parsing and validation is the most time and resource consuming part of the validator.
Contents of namespaces
Here you can see the individual sub namespaces of the main namespace of the validator lin = ValidateLib:
Now we will describe individual sub-namespaces and explain the most important classes and interfaces:
- ValidateLib.Control
- ValidateLib.Encoding
- ValidateLib.ErrorsAndWarnings
- ValidateLib.IRINormalization
- ValidateLib.Metadata
- ValidateLib.ResultWriters
- ValidateLib.Results
- ValidateLib.TableCompatibility
- ValidateLib.TabularData
- ValidateLib.UtilityClasses
ValidateLib.Control
Contains the most important interface IController
which is entry-point into the validation and manages all other modules and their communication.
The interface is really documented in the detail so users can use it easily.
Check out the generated documentation.
Also there is ControllerFactory
class which is used to produce the IController
instances. We did this so we can change the implementations of IController
without the users of the library even noticing.
ValidateLib.Encoding
This contains the classes that provide some functionality related to the encoding detection. Generated documentation.
ValidateLib.ErrorsAndWarnings
This contains all the errors and warnings that can be produced during the validation.
Class that is the parent class for both Error
and Warnings
is ErrorOrWarning
and important method is:
public virtual string GetMessage(CultureInfo cultureInfo)
Which retrieves localized message for the user to see based on the culture info provided.
Also we have factory classes WarningFactory
and ErrorFactory
for creating instances of these warnings and errors.
We are not really satisfied with the design of this namespace and there might be some changes done to it in the future.
See the generated documentation
ValidateLib.IRINormalization
This namespace has generally just one job and that is iri normalization.
See the generated documentation.
ValidateLib.Metadata
This namespaces contains a lot of embedded namespaces and forms an essential part of the validator. Its main goals are providing classes for the metadata parsing and validation stage where we extract the metadata from metadata file and create internal model of this.
It contains embedded namespaces:
- Descriptors - contains classes representing individual descriptors described in the specification for example table descriptor or table group descriptor. These classes contain a lot of methods used for the normalization or for parsing individual properties.
- DialectExtraction - contains classes that extracts dialect from table descriptor-
- Embedded - contains classes that extracts embedded metadata from the tabular data file.
- ErrorHandling - contains classes that enables more detailed warnings and errors messages
- MetadataLocation - contains classes that locate the metadata.
- Parsers - contains a parser class for each of the descriptor that parses the class. Most of the parses just inherit from the
DescriptorParserBase
which provides the needed functionality as it is generic and it calls the right methods likeGetPropertyParser(JProperty property, List<Warning> warnings)
from the interfaceIParsable<T>
which all of the descriptors must implement. - ParsingAndValidation - contains class
MetadataParseValidator
which orchestrates all of the submodules in this module and provides some unified interface for parsing and validating the metadata file. - PropertyParsers - contains individual property parsers. When we want to parse some property represented as a
JProperty
the individualParser
(from Parsers namespaces) just calls aGetPropertyParser
on particular descriptor and it produces a parser which is then able to parse this property. - Unification - contains classes that helps unify the state in which we have our metadata model before starting tabular data parsing and validation. It creates file wrappers for every table file so we download it just once and then use it from the disk. Also we want to start every validation as Table group validation so we unify it before the process begins.
- Validators - Contains validators of individual descriptors. Some of the descriptors can have some advanced rules of what combination of properties they are allowed to contain like for example
ForeignKeyDescriptor
cannot contain arbitrary foreign keys. They must exist in some table schema in the table group or table descriptor.
See the generated documentation.
ValidateLib.ResultWriters
This contains classes for creating result files in different formats. Contains enumeration of currently supported result file formats ResultFileFormat. Also contains ResultWriterFactory which creates instances of the result writers based on the enumeration mentioned before.
See the generated documentation.
ValidateLib.Results
Contains files and interfaces that represent the result of validation. Most important are ITableGroupValidationDetail
,ITableValidationDetail
, IResult
which user can directly access and use them as they are return values for the Controller
See the generated documentation.
ValidateLib.TableCompatibility
Contains classes that verifies that two table descriptors are compatible as defined in specification.
See the generated documentation.
ValidateLib.TabularData
Second biggest namespace after the Metadata
.
It contains following sub-namespaces:
- AnnotatedTabularDataModel - contains classes that represent the annotated tabular data model like for example annotated table, table group, row, cell, column etc. We map the entities from
Metadata
namespace to these basically. - Datatypes - contains a lot of classes that represent individual datatypes that are allowed to be defined in the metadata vocabulary.
- Parsing - contains classes that are used to parse the tabular data files based on the dialect descriptor that was found during the metadata localization or was defined in the metadata document. Class
Flags
contains information extracted from dialect descriptor used for parsing the tabular data file. Also we created aCustomStreamReader
that buffs the characters from theStreamReader
class so we can go forward and backward in the stream. These classes create huge bottleneck of the parsing and it is a space for possible improvements on the efficiency of the whole validator. - Validation - validation is done basically by the two main validation rules
ICellValidationRule
- to which we passAnnotatedCell
and it validates different aspects of it for example if the datatype matches desired datatype of the column the cell is in.IRowValidationRule
- to which we passAnnotatedRow
and it validates different aspects of it for example that each row contains same amount of fields. Then there are some exceptions likeFKValidationRule
because the validation of foreign keys is more demanding and complex. This is ready for future extensions as we just need to create more validation rules in the future (we will probably add reflection to create such classes on demand of the user).TabularDataTableValidator
orchestrates parsing and validation of one tabular data file.TabularDataTableGroupValidator
orchestrates parsing and validation of all the tabular data files inside the table group.
See the generated documentation.
ValidateLib.UtilityClasses
Contains utility classes that helps other modules fullfil their tasks.
See the generated documentation.