Technology and Data Variation

I am currently teaching modelers how to build effective Use Cases and many are struggling with understanding Technology and Data Variations, especially in Enterprise Architecture, Solutioning, and Design. As an Enterprise and Solution Architect for over 23 years, as well as Program and Operations Manager with 40 years of experience, bridging the 'clarity gap' between stakeholders can be a challenge. In today's age of Data, Machine Learning, and Artificial Intelligence (AI), we must close these gaps.

We must understand what we mean regarding the "technologies" we use and/or implement, but more importantly we must understand the impacts to our "data" and desired outcomes.

Interested in following? Visit our Website, "TOT Consulting, home of UML Operator" or go to our UML Operator Channel and subscribe. Be sure to turn on Notifications so that you can keep up with our latest videos.

Technology Variation

"Technology Variation" typically refers to the differences or variations in the technologies, tools, platforms, or software components used in the development, deployment, and operation of a system or software application. It is a concept often encountered in software engineering, especially when designing and implementing complex systems.

Here are a few key points to understand about technology variation:

Technological Choices

In software development, various technological choices need to be made, such as selecting programming languages, frameworks, libraries, databases, and other tools (including Vendors). These choices can significantly impact the performance, scalability, maintainability, and overall success of the project, programs, and platforms.

Enterprise Architects must look at the big picture when it comes to large and small companies, governments, and concerns. We will touch on risk concerns in a moment. The architect and stakeholders must consider everything touching the architecture and what technological choices they make.

Platform Diversity

Technology variation can also involve different platforms and environments. For example, an application might need to work on various operating systems (Windows, Linux, macOS) or on different types of devices (desktop, mobile, IoT devices).

As technologies evolve, there is no one operating system(s) to rule them all, and all must be considered. This is a primary reason why monolithic approaches are risky and impact more than dollar and labor costs. They may impact Market forces, Performance, and other factors that could disrupt or destroy a business. This may be further exposed in the next consideration pertaining the integration challenges.

Integration Challenges

When integrating different technologies within a system, challenges may arise due to differences in communication protocols, data formats, security mechanisms, and more. Ensuring seamless integration is crucial to achieving a functional and coherent system.

One of the problems (challenges) I have seen over the years, especially in large companies, corporations, enterprises, and governments, pertains with "integration" with old, new, and disparate entities (sub-company units, vendors, consulting groups, and technologies).

Making everything 'play nicely' (integrate seamlessly) is and will be the biggest challenge. Having budget and schedule constraints makes these challenges more daunting. Testing MUST BE the top consideration, especially Regression Testing when faced with such integration challenges. Skipping these practices will lead to greater challenges in delivery, costs, and schedule

Vendor Dependencies

Depending on the chosen technologies, the development process could introduce dependencies on third-party vendors for software components, services, or APIs. These dependencies can impact the project's long-term sustainability.

As stated above, vendor and technology selection must be accompanied with a thorough understanding of integration and relevant dependencies. There should be no misunderstandings between the differences between open-source and open-standards. With performance in mind, choosing the right implementation strategies (whether clouds or data centers) is critical, but more importantly is adherence to good service-oriented architecture styles (e.g., Microservice implementations).

Trade-offs and Considerations

Decisions about technology variation often involve trade-offs. For instance, one technology might offer better performance but be more complex to implement, while another might provide quicker development but sacrifice scalability.

I have taught something over my years regarding what I refer to as "The Architecture Value-Chain Proposition". This involves the considerations and trade-offs between Cost, Schedule, and Performance. Each architecture or business consideration must approach decisioning relevant to the sponsors Budget (costs), time-to-market (schedule), and performance consideration.

It's important to note is that such decisioning is not done "on the whole", but rather parts of the system, platform, and/or architecture (technology). When we stop for considerations, ultimately the sponsors make a value chain decision and rank in order their priorities pertaining to cost, schedule, and performance where they are not all equal. For example, if performance was ranked #1, thresholds must be set on costs and schedule. If there is little funding, then considerations must be exposed for schedule and performance. There are many variations to consider, however with the proper analysis it should be easy to arrive at a conclusion.

Evolution and Adaptation

Technologies evolve over time. New versions, updates, and emerging technologies can influence how a system is developed, maintained, and upgraded. Teams need to be prepared to adapt to these changes.

Understanding the technology hype curve is critical. This curve helps to understand the patterns of expectations, excitement, disillusionment, and eventual maturity that new technologies experience over time. Whether considering Moore's Law, or constraints relevant to hardware-over-software where one may be ahead of the other and costs relevant to investments early in a hype-curve must be considered, a good analysis and position should be taken for both short and longer term solutioning.

Risk Management

Different technologies come with different levels of risk. Some technologies might be relatively new and unproven, while others are more mature and widely adopted. The risk profile of a project can be influenced by the technologies chosen.

Risks are in every thing we do and decide. Such risks include, but are not limited to...

Security and fraud risk. ...
Compliance risk. ...
Operational risk. ...
Financial or economic risk. ...
Reputational risk.

There is also risk in the following seven functional areas:

Revenue Assurance
Revenue Accounting
Fraud
Privacy
Tax
Security
Credit and Collections

Business Goals and Context

Technological choices should align with the business goals, user requirements, and broader organizational context. Factors like budget, time constraints, and the target audience play a role in determining technology variation.

As stated in Trade-offs and Consideration, costs, schedule, and performance are key to setting goals, context, and objectives in delivery (product realization). Thus these are considerations are applied in every area in this article.

In Summary

Technology variation refers to the diverse set of technological decisions made during the software development lifecycle. These decisions can impact various aspects of the project, from architecture and design to implementation, deployment, and ongoing maintenance. Careful consideration and evaluation of technology options are essential for successful software development projects.

Data Variation

Since most know me as a "Data First" architect, this is the most important section of this article. I think that Technology is the easiest part, but today "data" is the most important consideration in architecture.

"Data Variation" typically refers to the differences, variability, or diversity in the characteristics, values, or attributes of data within a dataset or system. It is a concept commonly encountered in fields such as data analysis, statistics, machine learning, and database management. Understanding data variation is crucial for drawing meaningful insights, making informed decisions, and building accurate models.

Here are some key points to understand about data variation:

Data Diversity

Data variation highlights the range of values, patterns, and distributions present in a dataset. It encompasses differences in attributes, features, or measurements across different data points.

Data Management, especially dealing with data policies, privacy, security, regulatory, and much more is mission critical. Most companies I have seen do not have good data documentation, dictionaries, profiles, schemas, and things essential to SMART Data Management. SMART stands for specific, measurable, achievable, relevant, and time-bound.

Understanding how data evolves, where data flows, whom or what has access to data is the most mission critical aspect in any architecture, design, and/or implementation.

Statistical Analysis

Data variation is often assessed through statistical measures such as mean, median, standard deviation, range, and variance. These measures help quantify the spread and dispersion of data points.

Statistical analysis is a systematic process of collecting, organizing, interpreting, and drawing conclusions from data to make informed decisions or discover meaningful patterns and relationships. It involves using statistical methods, techniques, and tools to analyze data and extract valuable insights.

I talk more about this in my article, "The Premise of Analytics".

Pattern Recognition

Analyzing data variation can reveal underlying patterns, trends, or anomalies in the data. Identifying patterns is essential for making predictions and informed decisions.

Over 20 years ago when many were getting into "Big Data" and "Machine Learning", many failed to understand the difference between a Data Warehouse and Big Data which I referred to (at the time) as Data Warehouse 2.0. Thanks to the technology evolution in both hardware and software, we then had the abilities to do much more with large amounts of data.

Essentially, in the the next generations of data warehousing, we moved from data mining and variance analysis to data mining, data pattern recognition, and advanced analytics.

Data Quality

Variability in data can sometimes be an indicator of data quality issues, such as errors, inconsistencies, missing values, or inaccuracies. Understanding data variation can help identify and address data quality problems.

A key part of Data Management is Data Quality Management. Data quality management provides a context-specific process for improving the fitness or readiness of data that's used for analysis and decision making. The goal is to create insights into the readiness of data using various processes and technologies on large and more complex data sets. We may refer these sets as Business Ready Data Sets (or BRDs), or Structured/Unstructured Data Sets.

Feature Selection

In machine learning, understanding data variation can guide the selection of relevant features (attributes) for training models. Features with low variability may contribute less to predictive power.

In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon being observed. Choosing informative, discriminating, and independent features are crucial steps for effective algorithms in pattern recognition, classification, and regression.

Model Performance

Data variation can impact the performance of machine learning models. Models trained on diverse and representative data are often more robust and generalizable.

A machine learning model is a program that can find patterns or make decisions from a previously unseen dataset. An example may be in NLP (natural language processing) or NLG (natural language generation), machine learning models can parse and correctly recognize the intent (predictive analytics) behind previously unheard sentences or combinations of words (Diagnostic Analytics). During dat or model training, the machine learning algorithm is optimized to find certain patterns or outputs from the dataset, depending on the task. The output of this process is referred to a machine learning model.

Thus, model performance is critical depending on the AI use case and objective. I think we can see the importance of this is self-driving vehicles.

Data Preprocessing

Handling data variation often involves preprocessing steps such as normalization, scaling, and transformation. These steps can help ensure that variations in different attributes do not disproportionately affect analysis or modeling.

Data preprocessing is a critical and essential step in the data analysis, advanced analytics, and machine learning pipeline. It involves cleaning, transforming, and preparing raw data to make it suitable for analysis, modeling, and further processing. Proper data preprocessing enhances the quality of the data, reduces noise and errors, and improves the effectiveness of subsequent analytical or modeling tasks.

Visualization

Data Visualization, or Visualizing data variations, uses graphs, charts, histograms, scatter plots, or box plots can provide insights into the distribution and spread of data to support human understanding of data.

There may be needs to visualize data in real-time, near-real-time, or at times where data is considered stable and ready for human visualization (e.g., through business-ready-data sets or BRDs).

Domain Understanding

Context matters when interpreting data variation. Understanding the domain and the underlying processes generating the data is important for meaningful analysis.

As stated under Visualization, understand the domains, especially those domains under control of others becomes an important factor when considering data readiness.

Decision Making

Recognizing data variation helps stakeholders and analysts make informed decisions based on a deeper understanding of the data's characteristics.

In Data Science, Operational Research (OR), also known as Operations Research, is a field of study that uses mathematical and analytical methods to address complex decision-making and optimization problems in various domains. It is closely related to data science and often intersects with it, but it has a broader focus on optimization, modeling, and improving the efficiency of operations and processes.

In Summary

Data variation is a fundamental concept in data analysis that involves assessing the diversity and spread of data attributes. It informs decisions related to analysis, modeling, and interpretation, and it plays a crucial role in extracting meaningful insights from data.

Conclusion

In Computer Aided Software/System Engineering (CASE) and Unified Modeling Language (UML) considerations, especially when building Enterprise Architecture through Use Cases, understanding Technology and Data Variation concerns is extremely important. Such variations will cause eventual bifurcation in various scenarios. Failing to understand any aspect exposed in this post will jeopardize implementations in that architecture and therefore put your sponsors at risk. I will be producing videos soon on Effective Use Case Modeling. Visit us on UML Operator for more. Be sure to Subscribe to the channel to keep up.

Search This Blog

UML Operator Blog