Last month at //Build 2018, we released ML.NET 0.1, a cross-platform, open source machine learning framework. We would like to thank the community for the engagement so far in helping us shape ML.NET.
Today we are releasing ML.NET 0.2. This release focuses on adding new ML tasks like clustering, making it easier to validate models, adding a brand-new repo for ML.NET samples and addressing a variety of issues and feedback we received in the GitHub repo.
Some of the highlights with ML.NET 0.2 release are mentioned below.
New Machine Learning Tasks: Clustering
Clustering is an unsupervised learning task that groups sets of items based on their features. It identifies which items are more similar to each other than other items.
This might be useful in scenarios such as organizing news articles into groups based on their topics, segmenting users based on their shopping habits, and grouping viewers based on their taste in movies.
The Iris Flower sample illustrates how you can use Clustering with ML.NET 0.2
Easier model validation with cross-validation and train-test
Cross-validation is an approach to validating how well your model statistically performs. It does not require a separate test dataset, but rather uses your training data to test your model (it partitions the data so different data is used for training and testing, and it does this multiple times).With ML.NET 0.2 you can now use cross-validation and here is a good example.
Train-test is a shortcut to testing your model on a separate dataset. See example usage here.
Train using data objects with CollectionDataSource
ML.NET 0.1 enabled loading data from a delimited text file. CollectionDataSource in ML.NET 0.2 adds the ability to use a collection of objects as the input to a LearningPipeline.
The code-snippet below shows how you can use CollectionDataSource with ML.NET 0.2.
var pipeline = new LearningPipeline();
var data = new List() {
new IrisData { SepalLength = 1f, SepalWidth = 1f
,PetalLength=0.3f, PetalWidth=5.1f, Label=1},
new IrisData { SepalLength = 1f, SepalWidth = 1f
,PetalLength=0.3f, PetalWidth=5.1f, Label=1},
new IrisData { SepalLength = 1.2f, SepalWidth = 0.5f
,PetalLength=0.3f, PetalWidth=5.1f, Label=0}
};
var collection = CollectionDataSource.Create(data);
pipeline.Add(collection);
Full code snippet for CollectionDataSource can be found here.
New ML.NET samples repo
We have created a new repo https://github.com/dotnet/machinelearning-samples and added a few getting started and end-end app samples.
- Sentiment Analysis (Binary Classification)
This sample demonstrates how ML.NET can be used to analyze sentiment for customer reviews (positive or negative). The sample uses IMDB and Yelp reviews. - Classification of Iris Flowers (Multi-class Classification)
This sample is centered around predicting the type of an iris flower (setosa, versicolor, or virginica) based on the flower’s parameters such as petal length, petal width, etc. - Taxi-Fare prediction (Regression)
Taxi-Fare prediction sample demonstrates how to build a ML.NET model for predicting New York City taxi fares. A regression model is used in this sample which takes into account features like number of passengers, type of credit and distance traveled. - Cluster analysis on Iris Dataset (Clustering)
The sample demonstrates how to build a clustering model with ML.NET by performing a cluster analysis on the Iris Dataset. - GitHub Issue Classification (Multi-class classification)
This is an E2E sample which shows how to use ML.NET to build a GitHub issue classifier.
This blog post only goes over a few top announcements with ML.NET 0.2 release, the complete release notes for ML.NET 0.2 can be found here.
Help shape ML.NET for your needs
If you haven’t already, try out ML.NET you can get started here. We look forward to your feedback and welcome you to file file issues with any suggestions or enhancements in the GitHub repo.
https://github.com/dotnet/machinelearning
This blog was co-authored by Gal Oshri and Ankit Asthana
Thanks,
ML.NET Team