[NLP] Basics: Understanding Regular Expressions

The only guide you’ll ever need Photo by travelnow.or.crylater on Unsplash When I started learning natural language processing, regular expressions truly felt like a foreign language. I struggled to understand the syntax and it would take me hours to write a regular expression that would return the input I was looking for. Naturally, I tried … Read more

How To Sell Your Dataset?

As a consultant, I am often helping companies realize the strategic importance of their data. Most of the time, decision-makers want to leverage these datasets to build solid AI solutions but in some cases, they see data as a new revenue stream. As a result, a growing number of companies are exploring ways to monetize … Read more

Testing Glue Pyspark jobs

Setup your environment so that your Glue PySpark job reads from and writes to a mocked S3 bucket, thanks to moto server. Photo by Scott Sanker on Unsplash A typical use case for a Glue job is; you read data from S3; you do some transformations on that data; you dump the transformed data back … Read more

Understanding the Infinite Monkey Theorem

Absurdities of Probability Theory and why you cannot trust your gut instinct when guessing probabilities Imagine you have an infinite amount of monkeys. And now you give each of these monkeys a laptop and let them type randomly for an infinite amount of time. What are the chances that at some point, this story will … Read more

The Hidden Peculiarities of Realtime Data Streaming Applications

With the increasing number of open-source frameworks such as Apache Flink, Apache Spark, Apache Storm, and cloud frameworks such as Google Dataflow, creating realtime data-processing jobs has become quite easy. The APIs are well defined, and the standard concepts such as Map-Reduce follow almost similar semantics across all frameworks. However, still today, a developer starting … Read more

Using Python to Get Robinhood Data

Photo by Ray Hennessy on Unsplash Let’s automate some stocks, can be used to build a trading robot. So I have been messing with Robinhood for a couple of months now. I am no expert when it comes to stocks or trading. But I thought it would be cool to connect to my Robinhood account … Read more

4 Common Types of Hackathons

As mentioned in the previous article, there is not only one kind of competition named hackathon. In recent years, hackathon is not only for tech-savvy persons but requires collaboration between techies, designers, and businessmen. Based on my own experience, I will classify hackathons into four main categories. I will start from the more technical one … Read more

How the 4 Most Popular Intelligent Assistants Stack Up

A Brief Comparison of the Pros and Cons That Each Virtual Assistant Offers Image Source: UnSplash Siri Image Source: UnSplash Apple’s voice-based intelligent assistant Released in 2011 Apple announced SiriKit (SiriSDKs) in mid-2016 Features: Available on all iDevices Talks back to the user and proactively recommends actions to take Remembers context and understands relationships. Pros … Read more

An introduction to Causal inference

[This article was first published on Fabian Dablander, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. Causal inference goes beyond prediction by modeling the outcome of interventions and … Read more

Categories R Tags ExcerptFavorite

Will NumPy become Python?

Well, without Numpy, how can we perform mathematical operations between arrays? How does Python stack up against the other statistical languages of our period? Python’s array iteration is awesome, actually. The zip() function makes it possible to iterate through two lists at the same time. array = []for f, b in zip(array1, array2): res = … Read more

Machine Learning and the Future of Music: An era of ML artists

Artificial Intelligence has already taken over our lives and transformed it for the good. The days are over when you could still debate whether AI will impact a certain industry and transform it like others. Wondering why? Because artificial intelligence has already penetrated every other industry that we know and continues to impact several others. … Read more

Can Humans Fall Head Over Heels for AI?

With each passing day, we are using artificial intelligence for a variety of purposes and jobs. It has penetrated almost every industry and is helping them become innovative, develop authentic tools and build strategies towards a sustainable future. Researchers are eagerly exploring new use cases of artificial intelligence that have the power to radically transform … Read more

12 Steps to Production-Quality Data Science Code

There’s a Dilbert comic in which Dilbert tells his boss that he can’t take over a co-worker’s software project until he spends a week bad mouthing the co-worker’s existing code. If you’ve ever taken over maintaining someone else’s code, you’ll immediately see the truth in this. No one likes taking over maintaining or working on … Read more

How You Measure Months Matters — A Lot. A Look At Two Implementations of KDA

This post will detail a rather important finding I found while implementing a generalized framework for momentum asset allocation backtests. Namely, that when computing momentum (and other financial measures for use in asset allocation, such as volatility and correlations), measuring formal months, from start to end, has a large effect on strategy performance. So, first … Read more

Categories R Tags ExcerptFavorite

How to Build a Restaurant Recommendation System Using Latent Factor Collaborative Filtering

Image Designed by Freepik I usually watch youtube when I am taking a break from my work. I commit to myself to watch Youtube only for 5 to 10 minutes to rest my mind. Here is what usually happens, after I finished watching one video, the next video pops out from Youtube recommendations and I … Read more

Julia Box: Google Colab for Julia

Julia is a great language that is up and coming in the statistical computing place. Julia is actually very commonly used by biologists, medical scientists, and chemists; however, Julia for data-science, while not quite used on a large scale yet, is an idea that comes more and more feasible everyday. Julia certainly has advantages to … Read more

The Russell Westbrook Effect

For 3 straight seasons, he averaged a triple double in Oklahoma City. While the regular season numbers were stupendous, it didn’t translate to much postseason success. Reunited with his old Thunder buddy James Harden, what will Westbrook’s Effect be in Houston? Oscar Robertson was the first player to average a triple double for a whole … Read more

Advantages and Disadvantages of Artificial Intelligence

Artificial Intelligence is one of the emerging technologies which tries to simulate human reasoning in AI systems. John McCarthy invented the term Artificial Intelligence in the year 1950. He said, ‘Every aspect of learning or any other feature of intelligence can in principle be so precisely described that a machine can be made to simulate … Read more

Introducing Deep Java Library (DJL)

Build and deploy Deep learning models in Java We are excited to announce the Deep Java Library (DJL), an open source library to develop, train and run Deep learning models in Java using intuitive, high-level APIs. If you are a Java user interested in learning Deep learning, DJL is a great way to start learning. … Read more

Enjoy the GO with Alpha Go

How AlphaGo Zero beat AlphaGo Go has been around for more than 2 500 years with humans leading the pack in terms of skill. This was all until AlphaGo came around and laid down a new battlefield for humans vs computers. Now, humans aren’t even in the picture in terms of being the best (and … Read more

A.I. Will Reinstate Direct Democracy

THE BRIGHT SIDE OF AI The case for optimism in the age of artificial intelligence The majority of the debates regarding the advent of artificial intelligence (AI) seems to be focused only on negative sides. AI-powered monopolies, the tyranny of the minority, jobless future, collapse of the democratic system and capitalism, global inequality, digital dictatorships … Read more

GRNN with Small Samples

After a bank launches a new product or acquires a new portfolio, the risk modeling team would often be faced with a challenge of how to estimate the corresponding performance, e.g. risk or loss, with a limited number of data points conditional on business drivers or macro-economic indicators. For instance, it is required to project … Read more

Categories R Tags ExcerptFavorite

How to Double the Productivity of Your R&D Team

In 1919 Britain faced mass starvation. Although victorious in war, it was broke. Britain had been a food importer for decades, and now could neither grow enough food itself or buy it from others. Deeply in debt, its economy disrupted, politicians hoped scientists could clean up the mess. They turned to Rothamsted, an agricultural research … Read more

Append in Python

When choosing a collection type, it is useful to understand the properties of each type and choosing the most appropriate type for a particular data set. To know the most appropriate collection type you need to know the attributes of all the available types and then choose one from it based on your use case. … Read more

Have you taught your machine yet?

How Google’s teachable machine transfer learns right inside your little browser A visual representation of what a CNN actually sees — Source Since 2015, when a Resnet first surpassed the human accuracy threshold for classifying images, deep learning has taken the world by storm. Reading research detailing such achievements usually gives one the impression that … Read more

Data Science Dream is not so far with R

R is a programming language for statistical computing and data analysis. It is the industry standard for analysis tools. It was created by Ross Ihaka and Robert Gentleman in the year 1992 at the University of Auckland. It is open-source and is completely free to use. With more than 15,000 packages available online, there is … Read more

Stop Using Word Clouds without the Context

Word clouds don’t show the relations between the words, losing the context. Text network visualization resolves this problem. Obama’s 2013 inauguration address word cloud generated with Wordle What you see above is a word cloud of Barack Obama’s 2013 inauguration address. I don’t know whether word clouds are supposed to be informative, hopefully not, because … Read more

Taking a Step Back: Here’s What AI Needs to Learn from a Child

Artificial intelligence is taking the world by storm. It is already manifesting in a plethora of industries and organizations more reluctant to adopt it than ever. Several niches of artificial intelligence like machine learning are being religiously used by practitioners to form better strategies, predict industry trends and bring innovative products to the market. But, … Read more

Emerging Technology Trends for Banking Industry in 2020 & Beyond

Banks around the world are taking advantage of new technologies to streamline their operations and provide a better experience to their customers. Find out the latest trends that will disrupt banking industry in the future! Today, we live in the digital era where technology is driving change in almost every industry, whether it is the … Read more

Build a Python Crawler to Get Activity Stream with GitHub API

I want to get these activities like below ShusenTang starred lyprince/sdtw_pytorchchizhu starred markus-eberts/spertHexagram-King starred BrambleXu/knowledge-graph-learningYevgnen starred BrambleXu/knowledge-graph-learning…… 2.1 GitHub API First, we take a look at GitHub API documentation. If you don’t enable the two-factor authentication, you could run the below command to test the API. After inputting the password, you should see the response. … Read more

11 Evaluation Metrics Data Scientists should be familiar with— Lessons from A High-rank Kagglers’…

Evaluation metric, a theme of this post, is a somewhat confusing concept for ML beginners with another related but separate concept, loss function. They are similar in a sense they could be the same when we are lucky enough, but it will not happen every time. Evaluation metric is a metric “we want” to minimize … Read more

Host a dynamic website on Google Firebase for free using Node.js and Cloud Firestore DB

Requirements 1. Google AccountIf you don’t have a Google account, you need to sign up for one. You can do so by going to https://accounts.google.com/SignUp. 2. Node.js and npm Mac/WindowsYou can download the installer from https://nodejs.org/en/download/. LinuxFollow the steps below to install Node.js:1. Open a terminal2. Run the following commands: sudo apt-get install curlcurl -sL … Read more

Kafka Gotchas

Great, but not Perfect I’ve assisted several large clients in building a microservices-style architecture using Kafka as a messaging backbone, having a reasonably good understanding of its abilities and the use cases that really bring them out. But I’m not a Kafka apologist by any stretch; any technology that has gone through such a rapid … Read more

The Big Data Handbook

Why are there so many components? In the Hadoop ecosystem, there are many different layers which takes care of different components including data storage, integration, access, resource management, execution engines and operations & management. Before I lose my readers beyond this paragraph, let me provide a high level description of what the stack is trying … Read more

Python and Excel

It never had to be Excel or Python! even R or Excel! Photo by Haley Lawrence on Unsplash While reading an article by Tony Roberts ‘A Better Excel Goal Seek using Python’ Nov 22 last, I sort of went down a Rabbit Hole. Many years ago, in my mind, I mastered VBA, and I did … Read more

Froebenius coin problem

[This article was first published on R – Xi’an’s Og, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. A challenge from The Riddler last weekend came out as … Read more

Categories R Tags ExcerptFavorite

Gold-Mining Week 13 (2019)

[This article was first published on R – Fantasy Football Analytics, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don’t. ShareTweet The post Gold-Mining Week 13 (2019) appeared first … Read more

Categories R Tags ExcerptFavorite

Increasing Kaggle Revenue: Analyzing user data to recommend the best new product

In this project, we will create recommendations for increasing revenue at Kaggle, an online community for data science professionals. We will analyze a Kaggle customer survey, attempting to learn if there are any indicators of potential revenue growth for the company. To make our recommendations, we will try to learn: Is there market potential for … Read more

Top-K Off-Policy Correction for a REINFORCE Recommender System

The problem is the following: we have multiple other policies. Let’s take DDPG and TD3 trained actors from my library. Given these policies, we want to learn a new, unbiased one in an off-policy manner. As authors put it: Off-Policy Candidate Generation: We apply off-policy correction to learn from logged feedback, collected from an ensemble … Read more

Hacking Google Coral Edge TPU: motion blur and Lanczos resize

Google’s Coral project has recently gone out of beta. According to the benchmarks, Coral devices provide excellent neural network inference acceleration for DIY makers. Those devices ground on the specialized Tensor Processing Unit ASIC (Edge TPU), which proved to be somewhat tricky to work with, but the enforced limitations and quirks are rewarding. I was … Read more

How different factors have an influence on your life expectancy?

How attributes associated with your country of origin define your life expectancy? Everyone has their expiration date on this planet. After this day, they are buried six feet deep under the earth to decay. Humans die as a result of multiple causes such as accidents, diseases, war and other forms of death. One interesting trend … Read more