Following my previous article on the Strata Data Science Conference, I started to ponder future developments in data science and business intelligence — namely, how these two simple terms will change the way we work, think, and live. To be honest, “data science” seems somewhat distant to me; however, the concept of “business intelligence” can … Read more Power BI as a Tool for Business Intelligence
In the last story, we discussed RASA NLU which is an open-source conversational AI Tool. We used Tensorflow pipeline which is used for intent classification. The pipeline has different components such as tokenizer, featurizer, entity extractor, and intent classifier. Our intent classifier itself has sub-components such as TensorFlow embedding. Now we are going to discuss … Read more The crux of word embedding layers -Part 1
[This article was first published on Fabian Dablander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. If you are reading this, you are probably a Ravenclaw. Or a … Read more Harry Potter and the Power of Bayesian Constrained Inference
This is the first post in the series of “Digital Image Processing”. In this series, we will be discussing digital images and how to process them. Let’s discuss what an Image is. If you are from Signal processing background, then you might consider image as a two-dimensional signal i.e., a function with two dimensions f(x,y) … Read more Introduction to Digital Images
‘Growth Hacks’ vs ‘Growing Pain’ Some people accused him of fraud, I’m not so sure about that. This to me looks more like an honest mistake due to lack of experience in scaling up and entering into a field he is not familiar with. See, most of his more popular videos are entry-level tutorials with … Read more When ‘Growth Hacks’ Meets ‘Growing Pain’
In Search Of A Better Approach To Physical Training Credit: Coen Van Den Broek I cycle from time to time. I’m talking road cycling. I’m not a pro, I don’t want to be — but I love competition, I love racing and I do love a challenge. Turns out cycling is a sport heavily tangled … Read more Machine Learning, Cycling & 300W FTP (Part 1)
What’s the actual objective of a business case interview? It’s to test the ability of a candidate to both think critically and creatively when faced with an open-ended problem. But as an interviewer how do you assess these things? The thinking critically part is not as hard — if the person is stumbling through basic … Read more Making Data Science Interviews Better
Machine learning prediction models using time-series weather data. Image licensed from Adobe Stock Dengue, commonly called dengue fever, is a mosquito-borne disease that occurs in tropical and sub-tropical parts of the world. In mild cases, symptoms are similar to the flu: fever, rash, and muscle and joint pain. In severe cases, Dengue can cause severe … Read more Using Keras and TensorFlow to Predict Dengue Fever Outbreaks
Lesson 3 of “Practical Deep Learning for Coders” by fast.ai “I have not failed. I’ve just found 10,000 ways that won’t work.” ~Thomas Edison I’m a math adjunct working my way through Lesson 3 of “Practical Deep Learning for Coders” by fast.ai, and this week has been a major pride-swallower for me. At the end … Read more 10,000 Ways That Won’t Work
Time series prediction Photo by rawpixel.com from Pexels The idea of using a Neural Network (NN) to predict the stock price movement on the market is as old as NNs. Intuitively, it seems difficult to predict the future price movement looking only at its past. There are many tutorials on how to predict the price … Read more LSTM for time series prediction
So, how exactly do you leverage this amazing technology? Luckily, you’re a whiz with a keyboard and don’t even need to see the screen to do this — here’s how it breaks down: You’re doing data science — you need data! Then you gotta see if that data is balanced and usable. Prepare your data. … Read more Hey, Can (A)I Get Your Number?
Learn visualization using Python and Folium, from scratch Data visualization is not merely science, it is an art. The way our human brain works, it is really easy to process information in the form of visualization. After almost 25 years into digital mapping and many companies using machine learning to collect mass amounts of data, … Read more Visualizing Tesla Superchargers in France
Three new Quick Starts deploy JFrog Artifactory on the Amazon Web Services (AWS) Cloud in 30-45 minutes. The available options for deployment use your choice of Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), or Amazon Elastic Kubernetes Service (Amazon EKS). Favorite
Exploring the Difference or Nuance between Monolithic Kernel as Opposed to Microkernel In the dictionary a kernel is a softer, usually edible part of a nut, seed, or fruit stone contained within its shell such as “the kernel of a walnut”. It can also be the central or most important part of something “this is … Read more What is theKernel?
[This article was first published on R – Statisfaction, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Stanislaw Ulam’s auto-biography, “adventures of a mathematician”, originally published in 1976 … Read more Coding algorithms in R for models written in Stan
AWS IoT is announcing a new feature for AWS IoT Core called “Multi-Account Registration,” which is now available in beta. The new feature allows customers to quickly move devices between their AWS accounts by specifying the account information when the device connects to AWS IoT Core. Customers opting to use this feature will use Server … Read more AWS IoT Core Introduces Beta Feature To Simplify Device Certificate Registration
Why we need a new breed of leader in the data-fueled era Multiple choice time! What’s the best kind of worker? A) Reliable workers who carry out orders precisely, quickly, and efficiently. B) Unreliable workers who may or may not feel like doing what they’re told. If you think this is a no-brainer and reliable … Read more Artificial Intelligence: Do stupid things faster with more energy!
For our analyses of anonymized mobile phone location data here at Invenium we use, amongst others, Apache Spark™. In our applications, we interface it directly using the Java API as well as using the Python API pyspark. Recently we noticed an unusual performance drop when running our algorithms. After making sure that we haven’t made … Read more How to get the Python Environment of all Spark Cluster Nodes
SciPy wants your ideas to help it become more user-friendly You’ve heard of SciPy. You’ve probably used it. You might have looked through some of the technical documentation and user guides. You might even have an opinion of the documentation… But have you given any thought to actually getting involved and letting SciPy know how … Read more Get Involved With SciPy!
Endgame for “AI Winter” How a competition, ImageNet, along with a noisy algorithm, Stochastic Gradient Descent, changed the fate of AI? Picture from The Elders Scroll | Skyrim In the early 1980s, Winter was coming for Artificial Intelligence (AI) with a period of reduced funding and interest in AI research, which will later be called … Read more A classic bedtime story: Cinderella of Neural Networks
Reddit is a popular website for opinion sharing and news aggregation. The site consists of thousands of user-made forums, called subreddits, which cover a broad range of subjects, including politics, sports, technology, personal hobbies, and self-improvement. Given that most Reddit users contribute to multiple subreddits, one might think of Reddit as being organized into many … Read more Mapping the Underlying Social Structure of Reddit
[This article was first published on R – rud.is, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I posted a visualization of email safety status (a.k.a. DMARC) of … Read more 100% Stacked Chicklets
[This article was first published on Revolutions, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. If you ever need to work with data involving dates, times or durations … Read more Handling dates and times in R: a free online course
Automated Spot Instance Draining will automatically place Spot instances in “DRAINING” state upon the receipt of two minute interruption notice. ECS tasks running on Spot instances will automatically be triggered for shutdown before the instance terminates and replacement tasks will be scheduled elsewhere on the cluster. No new ECS service tasks will be started on … Read more Amazon ECS supports Automated Draining for Spot Instances running ECS Services
[This article was first published on R – Fantasy Football Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShareTweet The post Gold-Mining Week 4 (2019) appeared first … Read more Gold-Mining Week 4 (2019)
Variance as Information In Machine Learning, we need features for the algorithm to figure out patterns that help differentiate classes of data. More the number of features, more the variance (variation in data) and hence model finds it easy to make ‘splits’ or ‘boundaries’. But not all features provide useful information. They can have noise … Read more Introduction to Principal Component Analysis (PCA) — with Python code
In October 2012, the Harvard Business Review described “Data Scientist” as the “sexiest” job of the 21st century. Well, as we approach 2020 the description still holds true! The world needs more data scientists than there are available for hire. All companies – from the smallest to the biggest – want to hire for a … Read more 101 Data Science Interview Questions, Answers, and Key Concepts
[This article was first published on R – Win-Vector Blog, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. When studying regression models, One of the first diagnostic plots … Read more Why Do We Plot Predictions on the x-axis?
Organizations widely recognize the potential power of artificial intelligence (Ai). They instinctively understand that it feels like we’re on the cusp of something that will change our lives and our businesses in a profound way. Yet, many struggle with where to use it. If you’re looking for how and where your company should use Ai, … Read more Ai: Where To Begin?
How to discover if your Numpy is using a fast BLAS library. The Numpy Logo, Wikipedia. If your research work is highly dependent on Numpy-based calculations, such as vector or matrix additions and multiplications, etc. Then it is advisable to run a few checks in order to see if Numpy is using one of three … Read more Is your Numpy optimized for speed?
What is Ai?It depends on who you ask. When the term was coined in 1956, “Artificial Intelligence” has endured a lifetime of misunderstanding. Explainability is the missing link and the reason why it’s misunderstood. The problem lies in the interpretation of the word “intelligence.” In the words of legendary computer scientist Edsger Dijkstra: “The question … Read more Artificial Intelligence: Explainable in every language
Lets review the basic way in which serverless functions scale as you take a function from your laptop to the cloud. At a basic level, a function takes input, and provides an output response. That function can be repeated with many inputs, providing many outputs. A serverless platform like Cloud Functions manages elastic, horizontal scaling … Read more 6 strategies for scaling your serverless applications6 strategies for scaling your serverless applicationsSolution Architect
Editor’s note: This is the second in a series on modernizing your data warehouse. Find part 1 here. In the last blog post, we discussed why legacy data warehouses are not cutting it any more and why organizations are moving their data warehouses to cloud. We often hear that customers feel that migration is an … Read more Data warehouse migration challenges and how to meet themData warehouse migration challenges and how to meet themStrategic Cloud EngineerGroup Product Manager
Editor’s note: Today we hear from Muzzaffar bin Othman, CTO at Permodalan Nasional Berhad (PNB) on how the company uses Google Cloud’s Apigee API Management Platform to create digital investment channels. Read on to learn how PNB is increasing financial inclusion by expanding investment opportunities for all Malaysians. Permodalan Nasional Berhad (PNB) is one of … Read more PNB: Investing in Malaysia’s future with APIsPNB: Investing in Malaysia’s future with APIsChief Technology Officer, PNB
Photo by Ross Findon on Unsplash However, most modern web pages are quite interactive. The concept of “single-page application” means that the web page itself will change without the user having to reload or getting redirected from page to page all the time. Because this happens only after specific user interactions, there are few options … Read more Image Scraping with Python
https://www.youtube.com/watch?time_continue=82&v=ARJ8cAGm6JE This month I went to visit a friend of mine in Ireland who had just remodeled a house. She had purchased a large mirror and asked the workers onsite if they could hang it in the dining room and then she headed out for the day. When she returned, she found that while the … Read more Mirrors, Self-driving trucks, Unconscious Bias and Machine Learning/Artificial Intelligence
[This article was first published on R – scottishsnow, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. I helped run a conference last week. As part of this … Read more Conference abstract bi-grams – FOSS4GUK
Have you seen Apple’s new Test the Impossible ad? I have to admit, I find the style and message a bit grating. Nevertheless, there is a useful message in the ad. You don’t know what you don’t know. People are limited by their own experience and understanding. A 10-year-old, having spent their entire life in the States, … Read more Breaking Assumptions
An Overview of Lesson 1: Introduction to Random Forest of Machine Learning Course from Fast.ai On a meetup that I attended a couple of months ago in Sydney, I was introduced to an online machine learning course by fast.ai. I never paid any attention to it then. This week, while working on a Kaggle competition, … Read more Things I learned about Random Forest Machine Learning Algorithm
Using the various clustering models to assess patterns in Credit Card purchases and then make recommendations for the client Photo by rupixen on Unsplash Introduction to the Problem Before diving head-on into the various clustering methods, let us take a short look at the problem we are trying to solve here. A credit card company … Read more Analyzing Credit Card Purchase Patterns Using Clustering
Let’s look at gradient descent with adaptive learning rate. In part 4, we looked at some heuristics that can help us tune the learning rate and momentum better. In this article, let us look at a more principled way of adjusting the learning rate and give the learning rate a chance to adapt. Citation Note: … Read more Learning Parameters Part 5: AdaGrad, RMSProp, and Adam
RAPIDS was announced on October 10, 2018 and since then the folks in NVIDIA have worked day and night to add an impressive number of features each release. The preferred installation methods supported in the current version (0.9) are Conda and Docker (pip support was dropped in 0.7). In addition, RAPIDS it’s available for free … Read more Quick Install Guide: Nvidia RAPIDS + BlazingSQL on AWS SageMaker
A Rubik’s cube is a 3D puzzle that has 6 faces, each face usually has 9 stickers in a 3×3 layout and the objective of the puzzle is to achieve the solved state where each face only has a unique color.The possible states of a 3x3x3 Rubik’s cube are of the order of the quintillion … Read more Learning To Solve a Rubik’s Cube From Scratch using Reinforcement Learning
Before moving on to advanced optimization algorithms let us revisit the problem of learning rate in gradient descent. In part 3, we looked at stochastics and mini-batch versions of the optimizers. In this post, we will look at some commonly followed heuristics on how to tune the learning rate, etc. If you are not interested … Read more Learning Parameters Part 4: Tips For Adjusting Learning Rate, Line Search
What is it? Watson Studio is a hosted, full service and scalable data science platform. It allows us to integrate a variety of languages, products, techniques and data assets all within one place. Why is it awesome? As a R user, I like it because my colleagues and I can leverage the collaboration options and … Read more #FunDataFriday – Watson Studio
Tips for Better Logistic Regression Models in Scikit-Learn Logistic regression is the bread-and-butter algorithm for machine learning classification. If you’re a practicing or aspiring data scientist, you’ll want to know the ins and outs of how to use it. Also, Scikit-learn’s LogisticRegression is spitting out warnings about changing the default solver, so this is a … Read more Don’t Sweat the Solver Stuff
Image processing is one of the core focus areas of rOpenSci. Over the last few months we have released several major upgrades to core packages in our imaging suite, including magick, tesseract, and av. This post highlights a few cool new features. Magick 2.2 The magick package is one of the most powerful packages for … Read more Updates to the rOpenSci image suite: magick, tesseract, and av
This summer I was asked to collaborate on an analysis project with many response variables. As usual, I planned on automating my initial graphical data exploration through the use of functions and purrr::map() as I’ve written about previously. However, this particular project was a follow-up to a previous analysis. In the original analysis, different variables … Read more More exploratory plots with ggplot2 and purrr: Adding conditional elements
By knowing the PACF and ACF, we now better understand our dataset and the parameters to potentially choose. Now, we can move on to modeling our data by using the SARIMA model. Optimizing Parameters In order to get the best performance out of the model, we must find the optimum parameters. We do this by … Read more Predicting Prices of Bitcoin with Machine Learning
In Depth Analysis These Demand Forecasting Best Practices Cut Costs To illustrate the importance of accurate demand forecasting, consider the shocking 2014 headline “Walgreen CFO’s departure due to $1 billion forecasting error,”  as well as Nike’s 2001 demand planning software implementation failure that led to a $100-million loss in sales . There are several … Read more How Smart Are Your Supply Chain Predictions?