The 76th Tokyo R User Meetup happened on
March 2nd, graciously hosted by DeNA (an entertainment and
ecommerce company) in their lovely headquarters located in Shibuya.
(Photo courtesy of Takashi Minoda)
On this day another R User Meetup was also happening up in Sapporo,
Hokkaido. You can check them out
here. Although
this was the second Tokyo.R of 2019 I wasn’t able to attend the one in
January as I was at the RStudio::Conf in Austin, Texas… a long way from
home! Similar to my roundup blog
post
of the talks at Japan.R I
will be going through around half of all the talks. Hopefully, my
efforts will help spread the vast knowledge of Japanese R users to the
wider R community. Throughout I will also post helpful blog posts and
links from other sources if you are interested in learning more about
the topic of a certain talk. You can follow Tokyo.R by searching for the
#TokyoR hashtag on
Twitter.
Unlike most R Meetups a lot of people present using just their Twitter
handles so I’ll mostly be referring to them by those instead. I’ve been
going to events here in Japan for about a year but even now sometimes
I’m like, “Whoahh that’s what@very_recognizable_twitter_handle_in_the_japan_r_community
actually
looks like?!” Anyways…
Let’s get started!
Every Tokyo.R sessions starts off with three talks given by one of the
organizing team members who go over some of the very basic aspects of R for
beginner users. These talks are given by very experienced R users and
are a way to let newbies feel comfortable before diving into real world
applications of R in the main talks and LTs happening later on.
In this edition of Tokyo.R:

ktatyamtema gave a talk on R
basics, from downloading and installing packages, reading in data
files into R, and saving outputs from R. 
Next, kilometer00 gave a talk on
data pipelines, specifically focusing on proper coding style and
thought process behind good programming. He showed some really great
examples, like in the picture below, of writing code that is easy to
understand by using thetidyverse
verbs and pipes. His entire
slideshow has tons of great images on how to visualize programming
in R which you really don’t need Japanese to understand so I
recommend beginners to have a look through them!

Finally, koriakane gave a talk on
plotting and visualizing data using both base R andggplot2
. She
carefully explained stepbystep the process of creating different
types of graphs and working with colors and scales. At the end of
the slides there is a huge list ofggplot2
resources in Japanese
that would be very helpful to Japanese R users.
kato_kohaku: ModelAgnostic Explanations
In the first main talk of the day, @kato_kohaku
dived deep into
modelagnostic explanations using the DALEX
, iml
, and mlr
packages. One of the problems seen in the ML field is the growing
complexity of models as researchers have been able to push the limits of
what they can do with increased computational power and the consequent
discovery of new methods. The high performance of these complex models
have come at a high cost with interpretability being reduced
dramatically, with many of these newer models being called “black boxes”
for that very reason. A modelagnostic method is preferable to
modelspecific methods mainly due to their flexibility, as typically
data scientists evaluate many different types of ML models to solve a
task. A modelagnostic method allows you to compare these types of
models using the same method in a way that a modelspecific method
can’t.
@kato_kohaku
went over the workflow for performing modelagnostic
interpretation and covered partial dependence plots (PDP), individual
conditional expectation (ICE), permutation importance, accumulated local
effects plot (ALE), feature interaction, LIME, Shapley values, and more.
The topics he covered are well explained in Christoph
Molnar’s excellent book,
Interpretable Machine
Learning which@kato_kohaku
referred to through the presentation. There are a HUGE
amount of slides (146 of them!) filled with a ton of great info that you
can can read (a lot of the slides have explanations taken straight from
the documentation in English) so I highly recommend taking a look
through them if you are interested in what the DALEX
and iml
packages have to offer for interpreting models.
A great codethrough explanation of using DALEX
with mlr
in English
can be found
here
using the same data set as seen in @kato_kohaku's
slides.
y_mattu: Operators/Objects in R
One of the organizers of Tokyo.R, @y_mattu
, presented on objects in R.
Specifically he went over using the pryr
and lobstr
packages to dig inside R
objects and see what is happening “under the hood” of your everyday R
operations.
“Every object in R is a function, every function in R is an object”
The above maxim means that even operators such as +
can be turned into
a function using parentheses to place all the arguments:
Looking deeper @y_mattu
used the ast()
function from the lobstr
package to see the abstract syntax tree of the R expression that was
shown above, 1 + 2
.
library(lobstr)
lobstr::ast(1 + 2)
## o`+` ## +1 ## \2
The above shows the exact order in which the functions are being run by
R. To now understand what is happening when we run this operation we
need to look at the R environment. To check which environment holds the+ ()
operator
library(pryr)
## ## Attaching package: 'pryr' ## The following objects are masked from 'package:lobstr':
## ## ast, mem_used
pryr::where("+")
And we find that the base
package holds this operator and it is called
from the base
environment. In the final part @y_mattu
looked into
the +
operator itself by looking at the .Primitive()
as well aspryr::show_c_source()
to see the C source code used to make R be able
to run +
.
This was a very technical topic (for me) but it piqued my interest on
what’s actually happening whenever you run a line of R code!
At every Tokyo.R the hosting company is given time to talk about their
own company, how they use R, and hopefully provide some information for
any interested job seekers. For DeNA,@bob3bob3
gave this talk and he provided us on some details on what
exactly DeNA does as well as his own LT on SEM using lavaan
. DeNA is a
entertainment/ecommerce firm that is most wellknown for it’s cellphone
platform, Mobage
. Interestingly, they also took ownership of
MyAnimeList a few years back (probably one of
the largest anime/manga database communities in the world). For job
seekers he talked about the large variety of positions DeNA have
available in the “Kaggler” category as well as open positions in the
automobile, healthcare, sports analytics, HR analytics, marketing
researcher departments, and more…!
Following his elevator pitch about DeNA he gave a small talk about usinglavaan
to plot out path analysis for structural equation modeling.@bob3bob3
explained how he ended up creating his own plotting function
using the DiagrammeR
and Graphviz
packages to visualize the lavaan
output as he did not like the default plotting method.
flaty13: Tidy TimeSeries Analysis
@flaty13
, who has also recently presented at
Japan.R
and SportsAnalyst
Meetup
on tennis analytics, gave a talk on analyzing timeseries data with R.
He first talked about how packages like lubridate
and dplyr
, while
useful, may not be the best way to handle time series data. The solution@flaty13
talked about was the tsibble
package created by Earo
Wang. At RStudio::Conf 2019 Earo gave a
talk on this package and using tidy data principles with time series
data which you can watch
here.
@flaty13
used his own pedometer data from a healthcare app on his
iPhone for his demonstration. After reading the data in and performing
the usual tidyverse operations on it, the data frame was turned into atsibble
object and then visualized as a calendar plot using thesugrrants
package (also by Earo Wang).
saltcooky: Organizing a R Study Group at My Company!
@saltcooky
took the time to talk to us about something that doesn’t
usually get mentioned at Tokyo.R, as he reported about the success of an
intracompany R workshop he hosted. At @saltcooky's
company the
majority of his coworkers are Pythonistas with only three other
coworkers and him being R users. Hoping to change this dynamic,
especially as their company does a lot of data analytics, @saltcooky
set out to create some workshops. What he came up with were three
separate sessions heavily inspired by the Tokyo.R method that I talked
about in “Beginner Tutorial” section.
 The first session was basically around an hour on R basics and
talking about what exactly you can do with R, where he got the
Pythonistas to slowly get interested in using R for various
analytical tasks.  Second, was a tidyverse data handling/processing session with some
handson exercises with help from@y_mattu
.  The third session was using
ggplot2
for visualization.
Throughout the workshops @saltcooky
was asked some peculiar questions
like “Is there a difference in using .
vs. _
in separating words in
a function/object name?” and “Why are there so many packages/functions
with the same functionality!?”.
One of the major hurdles that @saltcooky
faced was in installing R for
all the different OSes that his coworkers used. The solution he came up
with was to use RStudio Cloud. This eased the burden for him as he
didn’t need to set up or manage any servers while the students did not
need to install any software at all! There was actually a great talk on
using “RStudio Cloud for
Education”
by Mel Gregory at RStudio::Conference 2019 a few months ago and it’s
a great resource for others thinking about holding workshops.
@saltcooky
concluded that his workshops were a mild success as he was
able to get a couple more people using R casually at his workplace and
although Python remains dominant he looks forward to convincing more
people to use R in the future.
moratoriamuo271: Topic Modeling Cooking Recipes!
Continuing the theme of “tidy” data analysis, @moratoriamuo271
applied
the concept to text analysis. The motivation for this talk came from the
difficulty and hassle of figuring out a nice set of meals to eat over
the course of a week. To solve this problem he sought to create a
recommendation engine for recipes!
As seen in the above flowchart @moratoriamuo271
:
 Web scraped recipes using
rvest
 Created some wordclouds for some EDA
 Used the
RMeCab
andtm
to create an organized document term matrix
(RMeCab
is a package specifically for Japanese text analysis)  Latent Dirichlet Analysis with
topicmodels
andldatuning
packages  Finally, splitting recipes into categories with
tidytext
Before he showed us the results of his work, @moratoriamuo271
took us
through a crash course on various topic modeling techniques from the
basic unigram model, to mixture of unigram models, and finally on
Latent Dirichlet Analysis (LDA).
He also went over the process in which he decided on the optimal number
of topics for his recommendation engine. This was done by looking at the
perplexity values from the ldatuning
package.
Here is a
great blog post by Peter Ellis on
using crossvalidation on perplexity to determine the optimal number
of topics. Below is the final finished product that gives you recipes
for nutritious and balanced meals for seven dinners!
@moratoriamuo271
has also released a blog post with ALL the code that
you can check out
here!
I couldn’t go through all of the talks but I will provide their slides
below (if/when they become available)
After the talks, everyone got together for a little afterparty over
food and drinks. Usually pizza is served but this time was a bit more
fancy with karaage and cheeseoncrackers being served. As the night
wore on R users from all over Tokyo talked about their successes and
struggles with R.
Unfortunately, there is only so much I can do to translate the talks,
especially as Tokyo.R doesn’t do recordings anymore, but I hope that I
could be of some help and maybe you’ll be inspired by a code snippet
there or a certain package name elsewhere, etc.! Tokyo.R happens almost
monthly and it’s a great way to mingle with Japanese R users as it is
the largest regular meetup here in Japan. Talks in English are also
welcome so if you’re ever in Tokyo come join us!
Related
Rbloggers.com offers daily email updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more…