Skip to content

Data Lineage and Impact Analysis

Compliace Documentation Requirements:

Data Traceability & Provenance
Record and Visualize the origin and paths for every single data field that is deemed to be of importance to the Risk Mitigation Strategy of the business via the desired level of abstraction and presented to all stakeholders in suitable formats

Validity
Examining the mechanisms that are sufficient and necessary to monitor & facilitate the trustworthiness of data.

Uniqueness
Helping limit -if not completely eradicate – the presence of multiple instances in multiple formats of the same information, in order to reduce cost and improve quality

Consistency & Timeliness
Verifying the consistency of data originating from different systems and units and their coherent, timely exchange of relevant data

Completeness & Accuracy
Assuring that the data of each process and function are Complete and Accurate as per the specification

The Challenges:

  • Heterogeneous legacy systems with lack of adequate documentation hinder business development
  • Diverse technologies with fragmented development resources
  • Limited adherence to design patterns and practices. Increased complexity due to lack of systems’ documentation
  • Lack of common language that leads to redundancy and misunderstandings.
  • Need to align business specifications with process and technical requirements and verify the actual, final, implementation
  • Increased overhead of compliance demands

The Benefits:

  • Assure Business Continuity
  • Improve productivity
  • Reduction of internal costs
  • Increase IT efficiency
  • Identify & address risks
  • Compliance with regulatory requirements
  • Standardize and facilitate the onboarding of new tech personnel
  • Build a healthy Data Warehouse
  • Assist DevOps in understanding the processes that lead to process loads
  • Cultivate and promote a knowledge sharing culture

Our Services:

Technical Documentation
Document applications and their architecture, components combined with analysis of data  dependencies and flows.

Data Lineage
Manage the availability, usability, integrity and security of the data, based on internal data standards and policies that also control data usage. 

Impact Analysis
Identify the potential consequences of code and data changes and estimate what needs to be modified to accomplish a given change.

Code Inspection
Evaluate the quality of the code. Determine if the code meets quality requirements or if it needs improvement.

Non-functional Testing
Test the readiness of your software application as per non-functional parameters (never addressed by functional testing). For example, performance, usability, reliability and security.

Our Documentation Services:

  1. Application Architecture
  • Application Components & hierarchies documented from the conceptual approach to eventual physical implementation
  • Inter-Component Communication
  • Inter-Application Dependency & Communication
  • Basic Application Flows
  • Supplementary Automation and Maintenance tasks

 

  1. Data Modeling 
  • Identify with consistency each field, table and view of each DB
  • Group related tables/views to functional taxonomies
  • Create ERD even when keys are not present in the physical implementation
  • Assist in the organization-wide definition of Data Dictionary terms to be used by all Development Teams
  • Couple the Development teams’ requirements with that of Data Governance

 

  1. MIS / Business Intelligence
  • Align each field used for Reporting with the originating fields, functions and processes that may impact its value
  • OLAP & OLTP BI Documentation & Optimization
  • Data mining processes
  • Event-driven processes
  • Performance management
  • Text mining
  • Predictive analytics apps
  • Prescriptive analytics apps

  1. Security, Roles, Authorization
  • Assess, where applicable, both application level and OS level security mechanisms
  • Determine the intended and actual security model of each application
  • Extend documentation of user access to all roles of the system (end user roles, developers and system operators)
  • Behavioral Security Protocols Design & Management
  • Record and Monitor Data & Process Ownership

  1. Operational Documentation

Document features of the application such as:

  • System Reliability
  • System Robustness
  • System Documentation
  • Application Development Lifecycle
  • Management of Environments(s) and Repositories
  • Application Testing and Change Processes
  • Data Quality: Completeness, Accuracy, Consistency, Timeliness, Uniqueness, Validity & Traceability

Extending data meta-analysis to  code meta-modelling

Our impressive successes in predicting electoral outcomes as well as minimising false positives/ negatives led us to consider analysing the code of applications not just the data that they are producing, in order to determine their lineage and provenance.  

Our approach: 
  • The body of the code is analyzed and transcribed to a meta-language in order to meet the client’s objectives
  • Our approach is technology agnostic and can be applied from Java and .NET to CA Plex and LANSA
  • Each line of the code is parsed and is tested regarding the rules set by the client i.e. return all commands that run external programs and processes and/or select data and/or update/insert data
  • The functions/processes/programs are thus interconnected via the fields that input from/output to via the parameters of interest.
  • Code can be visualized whereas a wealth of quality metrics can be reported, in order to align it with  best patterns and practices as well as assure code homogeneity

 

Our Unique Expertise:

Oxford Metadata has pioneered the modeling of relations in many-to-many communication networks and was, circa 2004, one of the early creators of interactive graph visualisations to depict complex relations.

At the time, no Graph databases existed, and our visualisations were made possible with the use of dynamic Adobe Flash applications. 

A testament to the then innovative value of this approach comes from the fact that the metadata data model RDF 1.0 specification –  was just published at that time.

Subsequently, at the University of Oxford, we utilised the Neo4j Graph Database in order to visualise relationships by braking them down to “semantic-triples” in the behavioural access control model we invented.

Every user activity is converted to a semantic triple: e.g.:

  • User X reads story Y
  • User X likes photo Z
  • User X comments on story Y etc.

Given the behaviour of the users we may opt to grant, deny or even revoke access to particular content, or even the whole platform! 

Obviously, the applicability of such a model can be used in domains that we do not usually  associate with “access control”: It can be applied equally for access of  intranet applications or event to social media platforms as a means to target (or not) specific content to users with specific behaviour.

In our case we used these techniques in order to predict electoral outcomes based on the what groups of users said to the pollsters that they will vote vs. their behaviour with “friends and family” in  social networks.

Yet, these techniques can be applied to domains such as Anti-money laundering.