<?xml version="1.0" encoding="UTF-8"?>
<rss  xmlns:atom="http://www.w3.org/2005/Atom" 
      xmlns:media="http://search.yahoo.com/mrss/" 
      xmlns:content="http://purl.org/rss/1.0/modules/content/" 
      xmlns:dc="http://purl.org/dc/elements/1.1/" 
      version="2.0">
<channel>
<title>Notes of a Dabbler</title>
<link>https://www.notesofdabbler.com/index.html</link>
<atom:link href="https://www.notesofdabbler.com/index.xml" rel="self" type="application/rss+xml"/>
<description>Blog to catalog learnings</description>
<generator>quarto-1.0.36</generator>
<lastBuildDate>Sun, 11 Sep 2022 00:00:00 GMT</lastBuildDate>
<item>
  <title>Exploring OMPR with HiGHS solver</title>
  <dc:creator>Notesofdabbler</dc:creator>
  <link>https://www.notesofdabbler.com/posts/post_2022_09_11/index.html</link>
  <description><![CDATA[ 




<p>There is a class of software for modeling optimization problems referred to as algebraic modeling systems which provide a unified interface to formulate optimization problems in a manner that is close to mathematical depiction and have the ability to link to different types of solvers (sparing the user from solver specific ways of formulating the problem). Both commercial and open source options are available. <a href="https://www.gams.com/">GAMS</a> and <a href="https://ampl.com/">AMPL</a> are examples of commercial options. The popular open source options are <a href="https://jump.dev/JuMP.jl/stable/">JuMP</a> in Julia and <a href="http://www.pyomo.org/">Pyomo</a> in python. I have typically used Pyomo in Python but have <a href="https://notesofdabbler.github.io/R_pyomo/2020_07_01_r_pyomo_blogpost.html">explored</a> using it from R. I recently became aware of algebraic modeling system in R provided by <a href="https://dirkschumacher.github.io/ompr/">OMPR</a> package developed by <a href="https://dirkschumacher.github.io/">Dirk Schumacher</a>.</p>
<p>There are commercial and open-source options available for solvers also. For a class of optimization problems referred to as Mixed Integer Linear Programs (MILP), the commercial solvers such as <a href="https://www.ibm.com/analytics/cplex-optimizer">CPLEX</a>, and <a href="https://www.gurobi.com/">GUROBI</a> perform significantly better than open source solvers such as <a href="https://www.gnu.org/software/glpk/">glpk</a>, and <a href="https://github.com/coin-or/Cbc">CBC</a>. A new open-source solver <a href="https://highs.dev/">HiGHS</a> has been developed recently that has generated quite a bit of buzz and by different accounts looks like a promising option. There is now a <a href="https://cran.r-project.org/web/packages/highs/index.html">highs</a> package in R that can call the HiGHS solver.</p>
<p>In this blog, I wanted to explore using OMPR modeling system with HiGHS solver by using it to solve a few examples of LP/MILP problems.</p>
<section id="example-1-example-from-highs-package" class="level3">
<h3 class="anchored" data-anchor-id="example-1-example-from-highs-package">Example 1: Example from highs package</h3>
<p>Here I want to just describe the example in mathematical notation and show how OMPR model is close to mathematical notation. The full details of this example are in this <a href="https://notesofdabbler.github.io/optwithR/highs_example_ompr.html">location</a>.</p>
<div class="columns">
<div class="column" style="width:40%;">
<section id="example-problem-in-highs-package" class="level4">
<h4 class="anchored" data-anchor-id="example-problem-in-highs-package">Example Problem in highs package</h4>
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Barray%7D%7Bll%7D%0A%5Cmin%20&amp;%20x_0%20+%20x_1%20+%203%20%5C%5C%0A&amp;%20x_1%20%5Cleq%207%20%5C%5C%0A&amp;%205%20%5Cleq%20x_0%20+%202x_1%20%5Cleq%2015%20%5C%5C%0A&amp;%206%20%5Cleq%203x_0%20+%202x_1%20%5C%5C%0A&amp;%200%20%5Cleq%20x_0%20%5Cleq%204%20%5C%5C%0A&amp;%201%20%5Cleq%20x_1%0A%5Cend%7Barray%7D%0A"></p>
</section>
</div><div class="column" style="width:60%;">
<section id="ompr-model" class="level4">
<h4 class="anchored" data-anchor-id="ompr-model">OMPR model</h4>
<div class="cell">
<div class="sourceCode cell-code" id="cb1" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb1-1">mdl <span class="ot" style="color: #003B4F;">=</span> <span class="fu" style="color: #4758AB;">MIPModel</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb1-2">      <span class="fu" style="color: #4758AB;">add_variable</span>(x0, <span class="at" style="color: #657422;">lb =</span> <span class="dv" style="color: #AD0000;">0</span>, <span class="at" style="color: #657422;">ub =</span> <span class="dv" style="color: #AD0000;">4</span>, <span class="at" style="color: #657422;">type =</span> <span class="st" style="color: #20794D;">"continuous"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb1-3">      <span class="fu" style="color: #4758AB;">add_variable</span>(x1, <span class="at" style="color: #657422;">lb =</span> <span class="dv" style="color: #AD0000;">1</span>, <span class="at" style="color: #657422;">type =</span> <span class="st" style="color: #20794D;">"continuous"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb1-4">      <span class="fu" style="color: #4758AB;">set_objective</span>(x0<span class="sc" style="color: #5E5E5E;">+</span>x1<span class="sc" style="color: #5E5E5E;">+</span><span class="dv" style="color: #AD0000;">3</span>, <span class="at" style="color: #657422;">sense =</span> <span class="st" style="color: #20794D;">"min"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb1-5">      <span class="fu" style="color: #4758AB;">add_constraint</span>(x1 <span class="sc" style="color: #5E5E5E;">&lt;=</span> <span class="dv" style="color: #AD0000;">7</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb1-6">      <span class="fu" style="color: #4758AB;">add_constraint</span>(x0 <span class="sc" style="color: #5E5E5E;">+</span> <span class="dv" style="color: #AD0000;">2</span><span class="sc" style="color: #5E5E5E;">*</span>x1 <span class="sc" style="color: #5E5E5E;">&lt;=</span> <span class="dv" style="color: #AD0000;">15</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb1-7">      <span class="fu" style="color: #4758AB;">add_constraint</span>(x0 <span class="sc" style="color: #5E5E5E;">+</span> <span class="dv" style="color: #AD0000;">2</span><span class="sc" style="color: #5E5E5E;">*</span>x1 <span class="sc" style="color: #5E5E5E;">&gt;=</span> <span class="dv" style="color: #AD0000;">5</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb1-8">      <span class="fu" style="color: #4758AB;">add_constraint</span>(<span class="dv" style="color: #AD0000;">3</span><span class="sc" style="color: #5E5E5E;">*</span>x0 <span class="sc" style="color: #5E5E5E;">+</span> <span class="dv" style="color: #AD0000;">2</span><span class="sc" style="color: #5E5E5E;">*</span>x1 <span class="sc" style="color: #5E5E5E;">&gt;=</span> <span class="dv" style="color: #AD0000;">6</span>)</span></code></pre></div>
</div>
</section>
</div>
</div>
<p>Since OMPR can directly call HiGHS optimizer, we can solve the model and get solution as shown below.</p>
<div class="cell">
<div class="sourceCode cell-code" id="cb2" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb2-1"><span class="co" style="color: #5E5E5E;"># solve model</span></span>
<span id="cb2-2">s <span class="ot" style="color: #003B4F;">=</span> mdl <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">solve_model</span>(<span class="fu" style="color: #4758AB;">highs_optimizer</span>())</span>
<span id="cb2-3"></span>
<span id="cb2-4"><span class="co" style="color: #5E5E5E;"># get solution</span></span>
<span id="cb2-5">s<span class="sc" style="color: #5E5E5E;">$</span>status</span>
<span id="cb2-6">s<span class="sc" style="color: #5E5E5E;">$</span>objective_value</span>
<span id="cb2-7">s<span class="sc" style="color: #5E5E5E;">$</span>solution</span></code></pre></div>
</div>
<p>Solving the above problem results in an objective value of 5.75 and solution of (0.5, 2.25)</p>
</section>
<section id="example-2-transportation-problem" class="level3">
<h3 class="anchored" data-anchor-id="example-2-transportation-problem">Example 2: Transportation Problem</h3>
<p>This example discusses a transporation problem from <a href="https://www.gams.com/latest/gamslib_ml/libhtml/gamslib_trnsport.html">GAMS model library</a> where the goal is to find the minimum cost way to meet market demand with available plant capacity. We just show how the OMPR package can handle variables involving indices using this example. The full description of this example is in this <a href="https://notesofdabbler.github.io/optwithR/gms_trnsprt_ompr.html">location</a>.</p>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-1-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-1" aria-controls="tabset-1-1" aria-selected="true">Mathematical Formulation</a></li><li class="nav-item"><a class="nav-link" id="tabset-1-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-1-2" aria-controls="tabset-1-2" aria-selected="false">Model build using OMPR</a></li></ul>
<div class="tab-content">
<div id="tabset-1-1" class="tab-pane active" aria-labelledby="tabset-1-1-tab">
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Barray%7D%7Bllr%7D%0A%5Cmin%20&amp;%5Csum_%7Bp=1%7D%5EP%5Csum_%7Bm=1%7D%5EMc_%7Bpm%7Dx_%7Bpm%7D%20&amp;%20(a)%20%5C%5C%0A&amp;%5Csum_%7Bm=1%7D%5EMx_%7Bpm%7D%20%5Cleq%20cap_p,%20%5C;p=1,2,%5Cldots,P%20%20&amp;%20(b)%5C%5C%0A&amp;%5Csum_%7Bp=1%7D%5EPx_%7Bpm%7D%20%5Cgeq%20dem_m,%20%5C;m=1,2,%5Cldots,M%20&amp;%20(c)%20%5C%5C%0A&amp;x_%7Bpm%7D%20%5Cgeq%200,%20%5C;p=1,2,%5Cldots,P;%5C;m=1,2,%5Cldots,M%0A%5Cend%7Barray%7D%0A"></p>
<p>where</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?x_%7Bpm%7D"> is the quantity to be shipped from plant <img src="https://latex.codecogs.com/png.latex?p"> to market <img src="https://latex.codecogs.com/png.latex?m"> (decision variable)</li>
<li>Objective (a) is to minimize shipping cost</li>
<li>Constraint (b) ensures that total supply from a plant is below capacity</li>
<li>Constraint (c) ensures that demand for each market is met.</li>
</ul>
</div>
<div id="tabset-1-2" class="tab-pane" aria-labelledby="tabset-1-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb3" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb3-1">np <span class="ot" style="color: #003B4F;">=</span> <span class="fu" style="color: #4758AB;">length</span>(plants)</span>
<span id="cb3-2">nm <span class="ot" style="color: #003B4F;">=</span> <span class="fu" style="color: #4758AB;">length</span>(mkts)</span>
<span id="cb3-3"><span class="co" style="color: #5E5E5E;"># create ompr model</span></span>
<span id="cb3-4">mdl <span class="ot" style="color: #003B4F;">=</span> <span class="fu" style="color: #4758AB;">MIPModel</span>() <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb3-5">  <span class="fu" style="color: #4758AB;">add_variable</span>(x[i, j], <span class="at" style="color: #657422;">i=</span><span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>np, <span class="at" style="color: #657422;">j=</span><span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>nm, <span class="at" style="color: #657422;">type =</span> <span class="st" style="color: #20794D;">"continuous"</span>,<span class="at" style="color: #657422;">lb =</span> <span class="dv" style="color: #AD0000;">0</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span></span>
<span id="cb3-6">  <span class="co" style="color: #5E5E5E;"># objective: min cost</span></span>
<span id="cb3-7">  <span class="fu" style="color: #4758AB;">set_objective</span>(<span class="fu" style="color: #4758AB;">sum_over</span>(<span class="fu" style="color: #4758AB;">cost</span>(i, j) <span class="sc" style="color: #5E5E5E;">*</span> x[i, j], <span class="at" style="color: #657422;">i =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>np, <span class="at" style="color: #657422;">j =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>nm), <span class="at" style="color: #657422;">sense =</span> <span class="st" style="color: #20794D;">"min"</span>) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> </span>
<span id="cb3-8">  <span class="co" style="color: #5E5E5E;"># supply from each plant is below capacity</span></span>
<span id="cb3-9">  <span class="fu" style="color: #4758AB;">add_constraint</span>(<span class="fu" style="color: #4758AB;">sum_over</span>(x[i, j], <span class="at" style="color: #657422;">j =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>nm) <span class="sc" style="color: #5E5E5E;">&lt;=</span> cap[i], <span class="at" style="color: #657422;">i =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>np) <span class="sc" style="color: #5E5E5E;">%&gt;%</span>  </span>
<span id="cb3-10">  <span class="co" style="color: #5E5E5E;"># supply to each market meets demand</span></span>
<span id="cb3-11">  <span class="fu" style="color: #4758AB;">add_constraint</span>(<span class="fu" style="color: #4758AB;">sum_over</span>(x[i, j], <span class="at" style="color: #657422;">i =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>np) <span class="sc" style="color: #5E5E5E;">&gt;=</span> dem[j], <span class="at" style="color: #657422;">j =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>nm)</span></code></pre></div>
</div>
</div>
</div>
</div>
<p>The figure on the left show the supply network (plants on top and markets below with numbers being capacity for plants and demand for markets). The figure on the right shows the solution where Chicago market is supplied by Seattle plant and San Diego plant supplies both New York and Topeka markets.</p>
<div class="columns">
<div class="column border">
<p><em>Network Information</em> <img src="https://www.notesofdabbler.com/posts/post_2022_09_11/network_info.png" class="img-fluid"></p>
</div><div class="column border">
<p><em>Solution</em> <img src="https://www.notesofdabbler.com/posts/post_2022_09_11/network_soln.png" class="img-fluid"></p>
</div>
</div>
</section>
<section id="example-3-map-coloring-problem" class="level3">
<h3 class="anchored" data-anchor-id="example-3-map-coloring-problem">Example 3: Map Coloring Problem</h3>
<p>This example discusses a map coloring problem where the goal is to use the minimum number of colors so that no two adjacent states in the US map have the same color. In this example also, I am just showing the mathematical formulation and OMPR model. The full description of this example is in this <a href="https://notesofdabbler.github.io/optwithR/map_coloring_ompr.html">location</a>.</p>
<div class="panel-tabset">
<ul class="nav nav-tabs"><li class="nav-item"><a class="nav-link active" id="tabset-2-1-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-1" aria-controls="tabset-2-1" aria-selected="true">Mathematical Formulation</a></li><li class="nav-item"><a class="nav-link" id="tabset-2-2-tab" data-bs-toggle="tab" data-bs-target="#tabset-2-2" aria-controls="tabset-2-2" aria-selected="false">Model build using OMPR</a></li></ul>
<div class="tab-content">
<div id="tabset-2-1" class="tab-pane active" aria-labelledby="tabset-2-1-tab">
<p><img src="https://latex.codecogs.com/png.latex?%0A%5Cbegin%7Barray%7D%7Bllr%7D%0A%5Cmin%20&amp;%20%5Csum_%7Bc=1%7D%5ECy_c%20&amp;%20(a)%5C%5C%0A&amp;%20%5Csum_%7Bc=1%7D%5ECx_%7Bic%7D%20=%201,%20%5C;i=1,2,%5Cldots,N%20&amp;%20(b)%5C%5C%0A&amp;%20x_%7Bic%7D%20+%20x_%7Bjc%7D%20%5Cleq%20y_c,%20%5C;%20%5Cmbox%7Bwhen%20%7Di,%20j%20%5Cmbox%7B%20are%20adjacent%7D%20&amp;%20(c)%5C%5C%0A&amp;%20x_%7Bic%7D%20%5C;%20binary%20%5C%5C%0A&amp;%20y_c%20%5C;%20binary%0A%5Cend%7Barray%7D%0A"></p>
<p>where:</p>
<ul>
<li><img src="https://latex.codecogs.com/png.latex?y_c=1"> if color <img src="https://latex.codecogs.com/png.latex?c"> is used, <img src="https://latex.codecogs.com/png.latex?x_%7Bic%7D=1"> if state <img src="https://latex.codecogs.com/png.latex?i"> is colored with color <img src="https://latex.codecogs.com/png.latex?c">.</li>
<li>Objective (a) is to minimize the number of colors used</li>
<li>Constraint (b) ensures that each state gets some color</li>
<li>Constraint (c) ensures that if state <img src="https://latex.codecogs.com/png.latex?i"> and <img src="https://latex.codecogs.com/png.latex?j"> are adjacent, they don’t get the same color.</li>
</ul>
</div>
<div id="tabset-2-2" class="tab-pane" aria-labelledby="tabset-2-2-tab">
<div class="cell">
<div class="sourceCode cell-code" id="cb4" style="background: #f1f3f5;"><pre class="sourceCode r code-with-copy"><code class="sourceCode r"><span id="cb4-1"><span class="co" style="color: #5E5E5E;"># OMPR model</span></span>
<span id="cb4-2">ns <span class="ot" style="color: #003B4F;">=</span> <span class="fu" style="color: #4758AB;">nrow</span>(nodes_df)</span>
<span id="cb4-3">nc <span class="ot" style="color: #003B4F;">=</span> <span class="dv" style="color: #AD0000;">4</span></span>
<span id="cb4-4">edge_str <span class="ot" style="color: #003B4F;">=</span> edge_df <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">mutate</span>(<span class="at" style="color: #657422;">edge_str =</span> <span class="fu" style="color: #4758AB;">glue</span>(<span class="st" style="color: #20794D;">"{fromid}_{toid}"</span>)) <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">pull</span>(edge_str)</span>
<span id="cb4-5">mdl <span class="ot" style="color: #003B4F;">=</span> <span class="fu" style="color: #4758AB;">MIPModel</span>()</span>
<span id="cb4-6">mdl <span class="ot" style="color: #003B4F;">=</span> mdl <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">add_variable</span>(x[i, c], <span class="at" style="color: #657422;">i =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>ns, <span class="at" style="color: #657422;">c =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>nc, <span class="at" style="color: #657422;">type =</span> <span class="st" style="color: #20794D;">"integer"</span>, <span class="at" style="color: #657422;">lb =</span> <span class="dv" style="color: #AD0000;">0</span>, <span class="at" style="color: #657422;">ub =</span> <span class="dv" style="color: #AD0000;">1</span>)</span>
<span id="cb4-7">mdl <span class="ot" style="color: #003B4F;">=</span> mdl <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">add_variable</span>(y[c], <span class="at" style="color: #657422;">c =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>nc, <span class="at" style="color: #657422;">type =</span> <span class="st" style="color: #20794D;">"integer"</span>, <span class="at" style="color: #657422;">lb =</span> <span class="dv" style="color: #AD0000;">0</span>, <span class="at" style="color: #657422;">ub =</span> <span class="dv" style="color: #AD0000;">1</span>)</span>
<span id="cb4-8">mdl <span class="ot" style="color: #003B4F;">=</span> mdl <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">set_objective</span>(<span class="fu" style="color: #4758AB;">sum_over</span>(y[c], <span class="at" style="color: #657422;">c=</span><span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>nc), <span class="at" style="color: #657422;">sense =</span> <span class="st" style="color: #20794D;">"min"</span>)</span>
<span id="cb4-9">mdl <span class="ot" style="color: #003B4F;">=</span> mdl <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">add_constraint</span>(<span class="fu" style="color: #4758AB;">sum_over</span>(x[i, c], <span class="at" style="color: #657422;">c =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>nc) <span class="sc" style="color: #5E5E5E;">==</span> <span class="dv" style="color: #AD0000;">1</span>, <span class="at" style="color: #657422;">i =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>ns)</span>
<span id="cb4-10">mdl <span class="ot" style="color: #003B4F;">=</span> mdl <span class="sc" style="color: #5E5E5E;">%&gt;%</span> <span class="fu" style="color: #4758AB;">add_constraint</span>(x[i, c] <span class="sc" style="color: #5E5E5E;">+</span> x[j, c] <span class="sc" style="color: #5E5E5E;">&lt;=</span> y[c], <span class="at" style="color: #657422;">i =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>ns, <span class="at" style="color: #657422;">j =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>ns, <span class="at" style="color: #657422;">c =</span> <span class="dv" style="color: #AD0000;">1</span><span class="sc" style="color: #5E5E5E;">:</span>nc, <span class="fu" style="color: #4758AB;">glue</span>(<span class="st" style="color: #20794D;">"{i}_{j}"</span>) <span class="sc" style="color: #5E5E5E;">%in%</span> edge_str)</span></code></pre></div>
</div>
</div>
</div>
</div>
<p>Solving this problem give the following map coloring</p>
<p><img src="https://www.notesofdabbler.com/posts/post_2022_09_11/USmap_colored.png" class="img-fluid"></p>


</section>

 ]]></description>
  <category>R</category>
  <guid>https://www.notesofdabbler.com/posts/post_2022_09_11/index.html</guid>
  <pubDate>Sun, 11 Sep 2022 00:00:00 GMT</pubDate>
  <media:content url="https://www.notesofdabbler.com/posts/post_2022_09_11/network_info.png" medium="image" type="image/png" height="121" width="144"/>
</item>
<item>
  <title>Using Pyomo from R through the magic of Reticulate</title>
  <dc:creator>Notesofdabbler</dc:creator>
  <link>https://www.notesofdabbler.com/posts/post_2020_07_01/index.html</link>
  <description><![CDATA[ 




<p><a href="http://www.pyomo.org/">Pyomo</a> is a python based open-source package for modeling optimization problems. It makes it easy to represent optimization problems and can send it to different solvers (both open-source and commercial) to solve the problem and return the results in python. The advantage of pyomo compared to commercial software such as <a href="https://www.gams.com/">GAMS</a> and <a href="https://ampl.com/">AMPL</a> is the ability to code using standard python syntax (with some modifications for pyomo constructs). Another open source package for modeling optimization problems is <a href="https://jump.dev/JuMP.jl/v0.19.0/index.html">JuMP</a> in Julia language.</p>
<p>My goal in this blog is to see how far I can get in terms of using Pyomo from R using the <a href="https://rstudio.github.io/reticulate/">reticulate</a> package. The simplest option would be to develop the model in pyomo and call it from R using reticulate. However, it still requires writing the pyomo model in python. I want to use reticulate to write the pyomo model using R. The details of the blog post (along with code) are in this <a href="https://notesofdabbler.github.io/R_pyomo/2020_07_01_r_pyomo_blogpost.html">location</a>.</p>
<section id="summary" class="level3">
<h3 class="anchored" data-anchor-id="summary">Summary</h3>
<p>Here I covered two examples to show how to develop a pyomo model from R using the reticulate package. While it might still be easier to develop the pyomo model in python (since it was meant to be that way), I found that it is possible to develop pyomo models in R also fairly easily albeit with some modifications (some maybe less elegant compred to the python counterpart). It may still be better to develop more involved pyomo models in python but reticulate offers a way to develop simple to intermediate levels models directly in R. I am summarizing key learnings:</p>
<ul>
<li>Need to overload arithmetic operators to enable things like addition etc. between pyomo objects</li>
<li>Use the option <code>convert = FALSE</code> to retain pyomo objects as python objects potentially avoid issues that are hard to troubleshoot.</li>
<li>Lack of list comprehension in R makes some of the constraint specifications more verbose but still works.</li>
<li>Need to be careful about indexing (sometimes need to explicitly specify a tuple and sometimes not)</li>
</ul>


</section>

 ]]></description>
  <guid>https://www.notesofdabbler.com/posts/post_2020_07_01/index.html</guid>
  <pubDate>Wed, 01 Jul 2020 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Proofs without Words using gganimate</title>
  <dc:creator>Notesofdabbler</dc:creator>
  <link>https://www.notesofdabbler.com/posts/post_2020_04_26/index.html</link>
  <description><![CDATA[ 




<p>I recently watched the 2 part workshop (<a href="https://www.youtube.com/watch?v=h29g21z0a68">part 1</a>, <a href="https://www.youtube.com/watch?v=0m4yywqNPVY">part 2</a>) on ggplot2 and extensions given by <a href="https://www.data-imaginist.com/about">Thomas Lin Pedersen</a>. First of, it was really nice of Thomas to give the close to 4 hour workshop for the benefit of the community. I personally learnt a lot from it. I wanted to try out <a href="https://gganimate.com/index.html">gganimate</a> extension that was covered during the workshop.</p>
<p>There are several resources on the web that show animations/illustrations of proofs of mathematical identities and theorems without words (or close to it). I wanted to take a few of those examples and use gganimate to recreate the illustration. This was a fun way for me to try out gganimate.</p>
<section id="example-1" class="level2">
<h2 class="anchored" data-anchor-id="example-1">Example 1:</h2>
<p>This example is taken from <a href="https://artofproblemsolving.com/wiki/index.php/Proofs_without_words">AoPS Online</a> and the result is that sum of first <img src="https://latex.codecogs.com/png.latex?n"> odd numbers equals <img src="https://latex.codecogs.com/png.latex?n%5E2">.</p>
<p><img src="https://latex.codecogs.com/png.latex?%201%20+%203%20+%205%20+%20%5Cldots%20+%20(2n%20-%201)%20=%20n%5E2%20"></p>
<p>The gganimate version of the proof (using the method in <a href="https://artofproblemsolving.com/wiki/index.php/Proofs_without_words">AoPS Online</a>) is shown below (<a href="https://github.com/notesofdabbler/learn_gganimate/blob/master/proof_without_words/sum_of_odds.R">R code</a>, <a href="https://github.com/notesofdabbler/learn_gganimate/blob/master/proof_without_words/sum_of_odds.html">html file</a>)</p>
<p><img src="https://www.notesofdabbler.com/posts/post_2020_04_26/https:/raw.githubusercontent.com/notesofdabbler/learn_gganimate/master/proof_without_words/figures/sum_of_odds.gif" class="img-fluid"></p>
</section>
<section id="example-2" class="level2">
<h2 class="anchored" data-anchor-id="example-2">Example 2:</h2>
<p>This example is also taken from <a href="https://artofproblemsolving.com/wiki/index.php/Proofs_without_words">AoPS Online</a> and the result is:</p>
<p><img src="https://latex.codecogs.com/png.latex?%201%5E3%20+%202%5E3%20+%20%5Cldots%20+%20(n-1)%5E3%20+%20n%5E3%20=%20(1%20+%202%20+%20%5Cldots%20+%20n)%5E2%20"></p>
<p>The gganimate version of the proof (using the method in <a href="https://artofproblemsolving.com/wiki/index.php/Proofs_without_words">AoPS Online</a>) is shown below ( <a href="https://github.com/notesofdabbler/learn_gganimate/blob/master/proof_without_words/sum_of_cubes.R">R code</a>, <a href="https://github.com/notesofdabbler/learn_gganimate/blob/master/proof_without_words/sum_of_cubes.html">html file</a>):</p>
<p><img src="https://www.notesofdabbler.com/posts/post_2020_04_26/https:/raw.githubusercontent.com/notesofdabbler/learn_gganimate/master/proof_without_words/figures/sum_of_cubes.gif" class="img-fluid"></p>
</section>
<section id="example-3" class="level2">
<h2 class="anchored" data-anchor-id="example-3">Example 3</h2>
<p>This example from <a href="https://artofproblemsolving.com/wiki/index.php/Proofs_without_words">AoPS Online</a> illustrates the result</p>
<p><img src="https://latex.codecogs.com/png.latex?%20%5Cfrac%7B1%7D%7B2%5E2%7D%20+%20%5Cfrac%7B1%7D%7B2%5E4%7D%20+%20%5Cfrac%7B1%7D%7B2%5E6%7D%20+%20%5Cfrac%7B1%7D%7B2%5E8%7D%20+%20%5Cldots%20=%20%5Cfrac%7B1%7D%7B3%7D%20"></p>
<p>The gganimate version of the proof (using the method in <a href="https://artofproblemsolving.com/wiki/index.php/Proofs_without_words">AoPS Online</a>) is shown below ( <a href="https://github.com/notesofdabbler/learn_gganimate/blob/master/proof_without_words/infinite_series_1.R">R code</a>, <a href="https://github.com/notesofdabbler/learn_gganimate/blob/master/proof_without_words/infinite_series_1.html">html file</a>):</p>
<p><img src="https://www.notesofdabbler.com/posts/post_2020_04_26/https:/raw.githubusercontent.com/notesofdabbler/learn_gganimate/master/proof_without_words/figures/infinite_series_1.gif" class="img-fluid"></p>
</section>
<section id="example-4" class="level2">
<h2 class="anchored" data-anchor-id="example-4">Example 4</h2>
<p>According to Pythagoras theorem, <img src="https://latex.codecogs.com/png.latex?%20a%5E2%20+%20b%5E2%20=%20c%5E2%20"> where <img src="https://latex.codecogs.com/png.latex?a">, <img src="https://latex.codecogs.com/png.latex?b">, <img src="https://latex.codecogs.com/png.latex?c"> are sides of a right angled triangle (with <img src="https://latex.codecogs.com/png.latex?c"> being the side opposite <img src="https://latex.codecogs.com/png.latex?90%5Eo"> angle)</p>
<p>There was an illustration of the proof of pythogoras theorem in a <a href="https://www.youtube.com/watch?v=T2K11eFepcs">video</a> from <a href="http://www.eChalk.co.uk">echalk</a>.</p>
<p>The gganimate version of the proof is shown below ( <a href="https://github.com/notesofdabbler/learn_gganimate/blob/master/proof_without_words/pythagoras_theorem.R">R code</a>, <a href="https://github.com/notesofdabbler/learn_gganimate/blob/master/proof_without_words/pythagoras_theorem.html">html file</a>)</p>
<p><img src="https://www.notesofdabbler.com/posts/post_2020_04_26/https:/raw.githubusercontent.com/notesofdabbler/learn_gganimate/master/proof_without_words/figures/pythagoras_theorem.gif" class="img-fluid"></p>
<p>In summary, it was great to use gganimate for these animations since it does all the magic with making transitions work nicely.</p>


</section>

 ]]></description>
  <guid>https://www.notesofdabbler.com/posts/post_2020_04_26/index.html</guid>
  <pubDate>Sun, 26 Apr 2020 00:00:00 GMT</pubDate>
  <media:content url="https://raw.githubusercontent.com/notesofdabbler/learn_gganimate/master/proof_without_words/figures/sum_of_odds.gif" medium="image" type="image/gif"/>
</item>
<item>
  <title>Keeping up with Tidyverse Functions using Tidy Tuesday Screencasts</title>
  <dc:creator>Notesofdabbler</dc:creator>
  <link>https://www.notesofdabbler.com/posts/post_2019_08_06/index.html</link>
  <description><![CDATA[ 




<p>David Robinson has done several <a href="https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ">screencasts</a> where he analyzes a Tidy Tuesday dataset live. I have listened to a few of them and found them very interesting and instructive. As I don’t use R on a daily basis, I have not kept up with what the latest is in Tidyverse. So when I listened to his screencasts, I learnt functions that I was not aware of. Since I sometimes forget which function I learnt, I wanted to extract all the functions used in the screencasts so that it is easier for me to refer to the ones that I am not aware of but should learn.</p>
<p>The approach I took is:</p>
<ul>
<li>Get all the Rmd analysis files from the screencast github repo.</li>
<li>Extract the list of libraries and functions used in each .Rmd file</li>
<li>Plot frequencies of function use and review functions that I am not aware of</li>
</ul>
<p>The html file with all the code and results is in this <a href="http://notesofdabbler.github.io/blog_notesofdabbler/getCodeFuncs.html">location</a>. The R file used to generate the html file is <a href="https://raw.githubusercontent.com/notesofdabbler/blog_notesofdabbler/master/popularTidyVerseFuncs/getCodeFuncs.R">here</a>.</p>
<p>The plot below shows the how many analyses used a particular package. <img src="https://www.notesofdabbler.com/posts/post_2019_08_06/https:/raw.githubusercontent.com/notesofdabbler/blog_notesofdabbler/master/popularTidyVerseFuncs/figure/libcntplt-1.png" class="img-fluid" width="800"></p>
<p>The top library as tidyverse is to be expected. It is interesting that lubridate is second. I can see that broom is used quite a bit since after exploratory analysis in the screencast, David explores some models. There are several packages that I was not aware of but I will probably look up the following: widyr, fuzzyjoin, glue, janitor, patchwork and the context in which they were used in the screencast.</p>
<p>The plot below shows the number of functions used from each package. <img src="https://www.notesofdabbler.com/posts/post_2019_08_06/https:/raw.githubusercontent.com/notesofdabbler/blog_notesofdabbler/master/popularTidyVerseFuncs/figure/pkgcntplt-1.png" class="img-fluid" width="800"></p>
<p>As expected, most used functions are from <em>ggplot2</em>, <em>dplyr</em>, <em>tidyr</em> since there is lot of exploratory analysis and visualization of data in the screencasts.</p>
<p>The next series of plots shows the individual functions used from the packages.</p>
<p><img src="https://www.notesofdabbler.com/posts/post_2019_08_06/https:/raw.githubusercontent.com/notesofdabbler/blog_notesofdabbler/master/popularTidyVerseFuncs/figure/fncountplt-1.png" class="img-fluid" width="800"></p>
<p><img src="https://www.notesofdabbler.com/posts/post_2019_08_06/https:/raw.githubusercontent.com/notesofdabbler/blog_notesofdabbler/master/popularTidyVerseFuncs/figure/fncountplt-2.png" class="img-fluid" width="800"> <img src="https://www.notesofdabbler.com/posts/post_2019_08_06/https:/raw.githubusercontent.com/notesofdabbler/blog_notesofdabbler/master/popularTidyVerseFuncs/figure/fncountplt-3.png" class="img-fluid" width="800"></p>
<p>Based on the above figures, I am listing below some functions that I was not aware of and should learn</p>
<ul>
<li><em>count</em> function in <em>dplyr</em> as a easier way to count for each group or sum a variable for each group.</li>
<li><em>geom_col</em> function in <em>ggplot2</em> for bar graphs</li>
<li>I became aware of <em>forcats</em> package for working with factors. <em>fct_reorder</em> and <em>fct_lump</em> from the package were used frequently.</li>
<li><em>tidyr</em> functions - <em>nest/unnest</em>, <em>crossing</em>, <em>separate_rows</em></li>
<li>I realized that I know only a few functions in <em>stringr</em> and should learn more about several functions that were used in the screencast.</li>
</ul>



 ]]></description>
  <guid>https://www.notesofdabbler.com/posts/post_2019_08_06/index.html</guid>
  <pubDate>Tue, 06 Aug 2019 00:00:00 GMT</pubDate>
  <media:content url="https://raw.githubusercontent.com/notesofdabbler/blog_notesofdabbler/master/popularTidyVerseFuncs/figure/libcntplt-1.png" medium="image" type="image/png"/>
</item>
<item>
  <title>Fastai Collaborative Filtering with R and Reticulate</title>
  <dc:creator>Notesofdabbler</dc:creator>
  <link>https://www.notesofdabbler.com/posts/post_2018_04_01/index.html</link>
  <description><![CDATA[ 




<p>Jeremy Howard and Rachel Thomas are founders of <a href="http://www.fast.ai/">fast.ai</a> whose aim is to make deep learning accessible to all. They offer a course called <a href="http://course.fast.ai/">Practical Deep Learning for Coders (Part 1)</a>. The last session, taught by Jeremy, was in Fall 2017 and the videos were released early January 2018. Their approach is top down by showing different applications first as black boxes followed by progressive peeling of the black box to teach the details of how things work. The course uses python and they have developed a python library <a href="https://github.com/fastai/fastai/tree/master/fastai">fastai</a> that is a wrapper around <a href="http://pytorch.org/">PyTorch</a>.</p>
<p>I wanted to learn reticulate by trying to create a R version of one of the python notebooks from that class. The class covers the topic of collaborative filtering in <a href="http://course.fast.ai/lessons/lesson5.html">lecture 5</a> and <a href="http://course.fast.ai/lessons/lesson6.html">lecture 6</a>. The dataset used is a sample of <a href="http://files.grouplens.org/datasets/movielens/ml-latest-small.zip">movielens dataset</a> where about ~670 users have rated ~9000 movies. The objective is to develop a model to predict the rating that a user will give for a particular movie.</p>
<p>The <a href="https://github.com/fastai/fastai/blob/master/courses/dl1/lesson5-movielens.ipynb">Jupyter notebook</a> for this topic is divided into 2 portions:</p>
<ul>
<li>In the first half, the model is developed using just high level fastai functions. The R notebook for the first half is located <a href="https://notesofdabbler.github.io/fastai_dl1_withR/movieLens.nb.html">here</a>.</li>
<li>In the second half, the model is developed from scratch and 3 different types of models are discussed going from matrix factorization type model to deep learning type models. The R notebook for the second half is located <a href="https://notesofdabbler.github.io/fastai_dl1_withR/movieLens_from_Scratch.nb.html">here</a>.</li>
</ul>
<p>Since the first half involved mainly python functions from fastai library, it seemed like a good use case for reticulate since we could use reticulate just for model development and use R functions for other pre and post processing tasks. The second half involved model building from scratch. In pyTorch, custom models need to be written as python classes. While it was still possible to use reticulate in this case, this may not be the ideal use case since it might be better for somebody developing custom models to do the whole work in python. But once they wrap it into a python package, it is easier to use from R. Overall, reticulate was great to work with and it made it very easy to translate a python function to an equivalent R function. It is a great addition to the R packages.</p>



 ]]></description>
  <guid>https://www.notesofdabbler.com/posts/post_2018_04_01/index.html</guid>
  <pubDate>Sun, 01 Apr 2018 00:00:00 GMT</pubDate>
</item>
<item>
  <title>Exploring Instacart Dataset with PCA</title>
  <dc:creator>Notesofdabbler</dc:creator>
  <link>https://www.notesofdabbler.com/posts/post_2017_05_22/index.html</link>
  <description><![CDATA[ 




<p>Recently, <a href="https://www.instacart.com/">Instacart</a> released a <a href="https://tech.instacart.com/3-million-instacart-orders-open-sourced-d40d29ead6f2">dataset</a> of ~3 million orders made by ~200,000 users at different days of week and times of day. There is also an ongoing <a href="https://www.kaggle.com/c/instacart-market-basket-analysis">Kaggle competition</a> to predict which products a user will buy again. My goal here is more modest where I just wanted to explore the dataset to find patterns of purchasing behaviour by hour of day, day of week and number of days prior to current order. An <a href="https://cdn-images-1.medium.com/max/800/1*wKfV6OV-_1Ipwrl7AjjSuw.png">example</a> of this kind of analysis is also shown in their blog. Here I wanted to explore if I can find such kind of patters by using the very common and popular dimension reduction technique - Principal Component Analysis (PCA). There are several great resources that introduce PCA if you are not familiar with PCA. One of the resources is the set of <a href="https://www.r-bloggers.com/in-depth-introduction-to-machine-learning-in-15-hours-of-expert-videos/">video lectures</a> on machine learning by Prof.&nbsp;Hastie and Prof.&nbsp;Tibshirani.</p>
<p>The general approach that I have followed is:</p>
<ul>
<li>Do principal component analysis on the data (each row is a product, each column is a time period (hour of day, day of week or number of days prior to current order))</li>
<li>Review the loading plots of first two principal components to see purchase patterns</li>
<li>Identify top 20 products that have high scores in either first or the second principal component</li>
<li>Check the purchasing pattern by checking the average number of orders for the products that were identified as having top scores in one of the principal components.</li>
</ul>
<p><strong>Spoiler Alert</strong>: Since my analysis is basic, don’t be disappointed if there are no big Aha moments (there will be none). But I think it is still fun to see how we can extract such information directly from data.</p>
<p>I downloaded the data from the following <a href="https://www.instacart.com/datasets/grocery-shopping-2017">link</a>. The data dictionary is in the following <a href="https://gist.github.com/jeremystan/c3b39d947d9b88b3ccff3147dbcf6c6b">link</a>. The full code with results is in the following <a href="http://notesofdabbler.github.io/blog_notesofdabbler/exploreData_PCA.html">location</a>.</p>
<p>Below are some basic info on the datasets</p>
<ul>
<li>The number of users are ~200,000.</li>
<li>The number of orders are ~3.4M. The number of products are ~50K or which ~5K account for 80% of total orders</li>
</ul>
<section id="pca-to-find-patterns-of-purchase-by-hour-of-day" class="level2">
<h2 class="anchored" data-anchor-id="pca-to-find-patterns-of-purchase-by-hour-of-day">PCA to find patterns of purchase by hour of day</h2>
<p>The goal here is to find products with different patterns of purchase timing by hour of day with PCA. Dataset for PCA has for each product (rows), the percentage of product orders at each hour of day (column). Since all the data is in percentages, I didn’t do any further scaling of data.</p>
<p>The plot of cumulative variance shows that first component accounts for 44% of variance, first two account for 58% and first 3 account for 67% of variance.</p>
<p><img src="https://www.notesofdabbler.com/posts/post_2017_05_22/https:/raw.githubusercontent.com/notesofdabbler/exploreInstacart/master/figure/unnamed-chunk-9-1.png" class="img-fluid" width="600"></p>
<p>Next, we will look at the first two loadings since first 2 components account for 58% of variance.</p>
<p><img src="https://www.notesofdabbler.com/posts/post_2017_05_22/https:/raw.githubusercontent.com/notesofdabbler/exploreInstacart/master/figure/unnamed-chunk-10-1.png" class="img-fluid" width="600"></p>
<p>First principal component loading PC1 indicates a pattern of either higher percentage of purcahses in the morning or evening. The second principal component loading indicates a pattern where there is higher purchase around 11am and 4pm. To check which product items follow these patterns, we look at products that either have high scores or low scores on a principal component. So here we take the top 20 and bottom 20 products in terms of their scores on PC1. The actual pattern still may not quite match the loading plot since the overall pattern is a combination of all principal component loadings.</p>
<p><img src="https://www.notesofdabbler.com/posts/post_2017_05_22/https:/raw.githubusercontent.com/notesofdabbler/exploreInstacart/master/figure/unnamed-chunk-11-1.png" class="img-fluid" width="600"></p>
<p>Below is the table that lists the actual products that are in top and bottom scores of PC1. Ice cream purchases tend to occur more in the evening. Items like granola bars, krispie treats, apples are purchased more in the morning.</p>
<p><img src="https://www.notesofdabbler.com/posts/post_2017_05_22/https:/raw.githubusercontent.com/notesofdabbler/exploreInstacart/master/figure/top_bottom_20_products_PC1scores.png" class="img-fluid" width="600"></p>


</section>

 ]]></description>
  <guid>https://www.notesofdabbler.com/posts/post_2017_05_22/index.html</guid>
  <pubDate>Mon, 22 May 2017 00:00:00 GMT</pubDate>
  <media:content url="https://raw.githubusercontent.com/notesofdabbler/exploreInstacart/master/figure/unnamed-chunk-9-1.png" medium="image" type="image/png"/>
</item>
</channel>
</rss>
