Overall, I found the course helpful and insightful, 4.79/5. There were many ideas that I had not considered before so I am posting some of my notes here. More than likely, you have seen most of these ideas so I will try to focus on the most interesting ones. Here is the link to the course.

**Data Exploration Checklist****Validation****Target Leakage****Metrics and Loss Functions****Metric Optimization****Mean Encoding****Coding Tips****Advanced Feature Engineering****Ensemble Strategies****StackNet****Creating a Diverse Set of Models****Tips on Meta-Learning and Stacking****Text Based Features in XGBoost****Sequence Feature Extraction (XGBoost)****Semi-supervised &…**

A ranking algorithm can be used if the target variable is numerically ordered. The model will capture the shared variation between adjacent classes. This model can also be useful for semi-supervised classification, where proxy targets are assigned as a lower rank than the actual targets.

The simplest ranking model is a regression model. The output of the regression model will be rounded to the closest integer, and the values are also clipped between maximum and minimum of the target. Multi-class models linear models are another option, but it should be noted that each class is treated separately and not in…

Business-related time series are often non-continuous, or discrete, time-based processes. Operationally, it might be useful for businesses to know, or to be able to quantify when a time series will be at a peak or trough in the future. The goal is to capture the trend and periodic patterns and to forecast the signal for >1 samples into the future. The example notebook is available in the TensorFlow Formulation section of this article.

In most business-related applications, the time series have non-constant mean and variance over time, or they can be said to be *non-stationary*. This is contrasted to stationary…

The purpose, problem statement, and potential applications came from this post on datasciencecentral.com. The goal is to approximate any multi-variate distribution using a weighted sum of *kernels. *Here, a kernel refers to a parameterized distribution. This method of using a decaying weighted sum of kernels to approximate a distribution is similar to a Taylor series where a function can be approximated, around a point, using the function’s derivatives.

- Approximate any empirical distribution
- Build a parameterized density estimator
- Outlier detection and dataset noise reduction

This solution I came up with was incorporated into a python package, KernelML. …

This is a review of a modern genetic algorithm called multi-offspring improved real-coded genetic algorithm (MOIRCGA). The original paper can be found here. The method used by this algorithm is, formally, called heuristical normal distribution and direction-based crosser (HNDDBX). This algorithm uses generates a direction-based candidate solution and modifies it with some parameterized random variable. The coded algorithm can be found on my github.

Genetic algorithms have been used for optimization since the 1960's. With the increased computational power of modern computers, genetic algorithms have gained attention for solving complex, non-linear objectives. …

This notebook shows the optimization of a multi-class, linear support vector machine using a simulation-based optimizer. Any simulation-based optimizer could be used with the Cuda kernel in this notebook. I used KernelML, my custom optimizer, in this example. The runtime for this script should be set to use the GPU: Runtime->Change runtime type.

The original SVM formulation can be found in Vapnik 1992. There have been advances to the robustness of the algorithm since then. Please see Robust Classifier 2019 section 6.1. The robust impementation looks very tedious to implement. If you are interested in implementing it, please consider emailing…

The covariance matrix has many interesting properties, and it can be found in mixture models, component analysis, Kalman filters, and more. Developing an intuition for how the covariance matrix operates is useful in understanding its practical implications. This article will focus on a few important properties, associated proofs, and then some interesting practical applications, i.e., non-Gaussian mixture models.

*I have often found that research papers do not specify the matrices’ shapes when writing formulas. I have included this and other essential information to help data scientists code their own algorithms.*

The covariance matrix can be decomposed into multiple unique (2x2)…

The objective is to implement a motion correction algorithm for lateral movements, in real-time, on an iOS device. The algorithm in this post can be tested by downloading MCorr, an iOS app. Lateral movements and rotational movements around the x, y and z axes are shown in Figure 1. with respect to the camera phone.

Pixel shift correction is a software based motion correction algorithm that approximates x and y axis lateral device motion. Camera motion can be corrected for by shifting the frame an optimal number of pixel, in the x or y direction, to maintain a stable video…

A time series can have different properties depending on the generating process and how the process is measured. There are many properties that describe a time series: 1) stationary 2) continuous 3) random 4) periodic. I’ll focus on non-random signals for this post, but I would recommend *Applied Stochastic Processes, Chaos Modeling, and Probabilistic Properties of Numeration Systems By Vincent Granville, Ph.D. *to anyone interested in random processes.

Most data scientist working with B2B or B2C time series are primarily working with non-continuous, or discrete, processes. Discrete means that the data are collected at fixed time intervals. If a time…

The motivation for making this algorithm was to give analysts and data scientists a generalized machine learning algorithm for complex loss functions and non-linear coefficients. The optimizer uses a combination of simple machine learning and probabilistic simulations to search for optimal parameters using a loss function, input and output matrices, and (optionally) a random sampler.

**Example use case:**

Clustering methods such as K-means use Euclidean distances to compare observations. However, The Euclidean distances between the longitude and latitude data points do not map directly to Haversine distance, the distances around a sphere. If the coordinates are normalized between 0 and…