Feature (Machine Learning)

So I talk about Features, Capabilities, Requirements, and more quite a bit. In this case, the notion of "Feature" takes a different turn.

Preface: This is a mix of my thoughts and those referenced in Wikipedia on this subject. I have read tens of thousands of pages over the years, and there are a lot of opinions (some good, some off). I have learned one thing in my years in systems and software engineering, and that is that the industry is out to sell their wares regardless of practical applications.

My focus here is on "patterns" first. A pattern is something that has been delivered, in production!, more than twice. Thus when applying data engineering to machine learning, the very first thing to consider is "pattern recognition".

In machine learning and pattern recognition, a feature is an individual measurable property or characteristic of a phenomenon being observed. Choosing informative, discriminating and independent features is a crucial step for effective algorithms in pattern recognition, classification and regression.

So why am I posting this content in this Blog?

First off, I am framing this using a Wikipedia reference on Feature(s) relevant to Machine Learning. Secondly, I have spent the last 12 years in my former company help us to advance in Big Data, Advanced Analytics, Personalization, NLG/NLP, and Machine Learning. I won a CIO award for my work, but really give the credit to the great people working on the effort that got me that award. IT was a struggle and continued to be a struggle for several years later. Mainly due to fact that others had not advanced their own knowledge and capabilities.

In my UML Operator Channel, where I focus on UML, OOD, and CASE, I will be doing many videos on topics around Machine Learning (ML) and Artificial Intelligence (AI). But let's get started...

Question: So what is a Feature in Machine Learning?

Answer: A Feature in ML is a set of numeric features which can be conveniently described by a feature vector. An example of reaching a two-way classification from a feature vector (related to the perceptron) consists of calculating the scalar product between the feature vector and a vector of weights, comparing the result with a threshold, and deciding the class based on the comparison.

I will let others debate the nouns and verbs in Wikipedia, but I actually solutioned and delivered features, and I am good with this content.

For years, people made the notion of "algorithms" some scientistic huge complex subject that would take a lifetype for form, when that was the furthest thing from the truth thanks to modern-day technologies and computing power.

An algorithm is a process or set of rules to be followed in calculations or other problem-solving operations, especially by a computer. Algorithms for classification from a feature vector include nearest neighbor classification, neural networks, and statistical techniques such as Bayesian approaches.

Example

In character recognition, features may include histograms counting the number of black pixels along horizontal and vertical directions, number of internal holes, stroke detection and many others.

In speech recognition, features for recognizing phonemes can include noise ratios, length of sounds, relative power, filter matches and many others.

In spam detection algorithms, features may include the presence or absence of certain email headers, the email structure, the language, the frequency of specific terms, the grammatical correctness of the text.

In computer vision, there are a large number of possible features, such as edges and objects.

Extensions

In pattern recognition and machine learning, a feature vector is an n-dimensional vector of numerical features that represent some object. Many algorithms in machine learning require a numerical representation of objects, since such representations facilitate processing and statistical analysis. When representing images, the feature values might correspond to the pixels of an image, while when representing texts the features might be the frequencies of occurrence of textual terms. Feature vectors are equivalent to the vectors of explanatory variables used in statistical procedures such as linear regression. Feature vectors are often combined with weights using a dot product in order to construct a linear predictor function that is used to determine a score for making a prediction.

The vector space associated with these vectors is often called the feature space. In order to reduce the dimensionality of the feature space, a number of dimensionality reduction techniques can be employed.

Higher-level features can be obtained from already available features and added to the feature vector; for example, for the study of diseases the feature 'Age' is useful and is defined as Age = 'Year of death' minus 'Year of birth' . This process is referred to as feature construction. Feature construction is the application of a set of constructive operators to a set of existing features resulting in construction of new features. Examples of such constructive operators include checking for the equality conditions {=, ≠}, the arithmetic operators {+,−,×, /}, the array operators {max(S), min(S), average(S)} as well as other more sophisticated operators, for example count(S,C) that counts the number of features in the feature vector S satisfying some condition C or, for example, distances to other recognition classes generalized by some accepting device. Feature construction has long been considered a powerful tool for increasing both accuracy and understanding of structure, particularly in high-dimensional problems. Applications include studies of disease and emotion recognition from speech.

Selection and Extraction

The initial set of raw features can be redundant and too large to be managed. Therefore, a preliminary step in many applications of machine learning and pattern recognition consists of selecting a subset of features, or constructing a new and reduced set of features to facilitate learning, and to improve generalization and interpretability.

Extracting or selecting features is a combination of art and science; developing systems to do so is known as feature engineering. It requires the experimentation of multiple possibilities and the combination of automated techniques with the intuition and knowledge of the domain expert. Automating this process is feature learning, where a machine not only uses features for learning, but learns the features itself.

What's next?

I want to use CASE and UML to support Requirement Engineering in projects to achieve desired results. Nothing is worse then spending million$ in a project/program and not getting anywhere past the hype.

Search This Blog

UML Operator Blog

Feature (Machine Learning)

Comments

Post a Comment

Popular posts from this blog

Sparx EA and Open Collaboration

MDA vs MDD

BPMN Diagram versus Sequence Diagram