A Simple Interest Calculation Challenge
Chen Hao posted on 02 Aug 2016Problem Description
Here is an interest calculation problem, for simplicity, I put the model in this way:
There is such an insurance product that ask you to put in S$2400 every year, then you will get an annual interest rate of 3% for the money you depositted, so ask how much you can earn when you take the money out after 21 years.
Codes Display
1. R version 1 (H)
I contributed my first draft of R codes as below:
saveEarn <- function(yearPut, yearInterest, yearTime){
sum <- 0
base <- 0
i <- 0
lumpSum <- function(x, rate) { x * (1+rate)}
while(i <= yearTime){
sum <- lumpSum(x = base, rate = yearInterest)
cat(paste0(" --Year ", i, "; Sum:", round(sum, 2), "\n"))
base <- sum + yearPut
i <- i + 1
}
save <- yearPut * yearTime
earn <- sum - save
out <- c(round(save, 2), round(sum,2), round(earn, 2))
names(out) <- c("totalMoneyPut", "totalMoneyGet", "earn")
return(out)
}
saveEarn(yearPut=200*12, yearInterest=0.03, yearTime=21)
The output is:
--Year 0; Sum:0
--Year 1; Sum:2472
--Year 2; Sum:5018.16
--Year 3; Sum:7640.7
--Year 4; Sum:10341.93
--Year 5; Sum:13124.18
--Year 6; Sum:15989.91
--Year 7; Sum:18941.61
--Year 8; Sum:21981.85
--Year 9; Sum:25113.31
--Year 10; Sum:28338.71
--Year 11; Sum:31660.87
--Year 12; Sum:35082.7
--Year 13; Sum:38607.18
--Year 14; Sum:42237.39
--Year 15; Sum:45976.52
--Year 16; Sum:49827.81
--Year 17; Sum:53794.64
--Year 18; Sum:57880.48
--Year 19; Sum:62088.9
--Year 20; Sum:66423.57
--Year 21; Sum:70888.27
totalMoneyPut totalMoneyGet earn
50400.00 70888.27 20488.27
After I sent these out, I got the reply from my colleague W:
“I think it can be implemented in a single loop, instead of an embedded function. I am not sure about R’s performance with loops, functions, and even recursion, but I think simplicity is generally preferred. “
2. C version (W)
With an invitation that “Talk is cheap, show me your codes”, W send us his C implementation
#include <stdio.h>
#include <stdlib.h>
int main (int argc, char** argv){
//int calcInt(long payment, long intRate, long term){
if (argc < 4){ return 1; }
double payment, intRate, term, yr, total, interest, annInt;
payment = atof(argv[1]); intRate = atof(argv[2]); term = atof(argv[3]);
yr = 1; total = interest = annInt = 0;
for (;yr <= term; yr++){
total += payment;
annInt = total* intRate;
total += annInt;
printf("Year: %2.0f\tSavings: %2.2f\tinterest: %.2f\n", yr, total, annInt);
interest += annInt;
}
printf("Total saved: %.2f\tFinal balance: %.2f\tTotal interest earned: %.2f\n", total-interest, total, interest);
return 0;
}
And he also pointed that:
“In general the time taken by each feature is like : recursion > function » loop » arithmetic (this applies at least to C or Perl/Python, I don’t know enough about R to make a serious statement) . So in C or low level languages, a simple loop will be more efficient and preferred over function and even recursion.”
3. Python version (B)
With claps and cheers, my colleague B sent his impressive python version:
interest = 0.03
injection = 2400
years = 21
sum(map(lambda x: injection * (1+interest) ** x, range(1,(years+1))))
And he highlighted that “The actual code is just one line. The rest is declaration.”
And he also gave his philosophy of coding, which I agree a lot:
“While recursion is perhaps not the fastest, it is really elegant in my opinion and rather cool to write. And I put code clarity first and optimization second. Clarity will help with debugging and spotting mistakes. No sense having buggy fast codes. “
(I should mension that actually the first version of C codes sent by W is a buggy one, attched above is a modified version.)
And my colleague M is also motivated to try using CUDA, but what a pity that we don’t have any Nvidia GPU cards here. Her weppon is too powerful for this tiny issue, but we are really interested to see using a sledgehammer to crack a nut, right?
4. R version 2 (H)
So inspired by the codes from B, I tried to implement it using one line R codes:
annualPut <- 200*12; annualInterest <- 0.03; saveYears <- 21
sum(sapply(seq_len(saveYears), function(i){annualPut*(1+annualInterest)**i}))
And I also gave me a second change to modify my R version 1 code (here I tidied the version 1 codes and remove the function declaration in function):
saveEarn <- function(annualPut, annualInterest, saveYears){
base <- annualPut
for(i in seq_len(saveYears)){
lumpSum <- base*(1 + annualInterest)
cat(paste0(" --Year ", i,
"; total put ", annualPut * i,
"; lump sum:", round(lumpSum, 2),
"; earn:", round(lumpSum-annualPut * i,2), "\n"))
base <- lumpSum + annualPut
}
totalPut <- annualPut * saveYears
totalEarn <- lumpSum - totalPut
out <- c(round(totalPut, 2), round(lumpSum,2), round(totalEarn, 2))
names(out) <- c("totalMoneyPut", "totalMoneyGet", "earn")
return(out)
}
saveEarn(annualPut=200*12, annualInterest=0.03, saveYears=21)
5. Perl version (W)
The above one line python or R code looks cool, but they only output the final total money, it lost the yearly details. So W showed out his ultimate weapon: Perl.
($a,$term,$r) = (12*200, 21, 0.03);
printf("saved: %f\tinterest: %8.2f\tfinal: %.2f\n", $a*$_,$i=($t+=$a)*$r, $t+=$i) foreach (1..$term);
This is really cool and with all the details printed.
saved: 2400.000000 interest: 72.00 final: 2472.00
saved: 4800.000000 interest: 146.16 final: 5018.16
saved: 7200.000000 interest: 222.54 final: 7640.70
saved: 9600.000000 interest: 301.22 final: 10341.93
saved: 12000.000000 interest: 382.26 final: 13124.18
saved: 14400.000000 interest: 465.73 final: 15989.91
saved: 16800.000000 interest: 551.70 final: 18941.61
saved: 19200.000000 interest: 640.25 final: 21981.85
saved: 21600.000000 interest: 731.46 final: 25113.31
saved: 24000.000000 interest: 825.40 final: 28338.71
saved: 26400.000000 interest: 922.16 final: 31660.87
saved: 28800.000000 interest: 1021.83 final: 35082.70
saved: 31200.000000 interest: 1124.48 final: 38607.18
saved: 33600.000000 interest: 1230.22 final: 42237.39
saved: 36000.000000 interest: 1339.12 final: 45976.52
saved: 38400.000000 interest: 1451.30 final: 49827.81
saved: 40800.000000 interest: 1566.83 final: 53794.64
saved: 43200.000000 interest: 1685.84 final: 57880.48
saved: 45600.000000 interest: 1808.41 final: 62088.90
saved: 48000.000000 interest: 1934.67 final: 66423.57
saved: 50400.000000 interest: 2064.71 final: 70888.27
And he gives an even more short version if we only want the final amount:
$t=($t+$a)*(1+$r) foreach (1..$term);
Simple Benchmark Test
All looks good now, but which version do you like the most? Let’s do a simple benchmark:
My benchmark is designed in this way, for all three variables in the model, I only change the variable years/term
with value in range 100, 200, 500, 1000, 5000, 10000, 100000, 1000000
, then the time elapsed for each years/term
run using one of the above implementations is recorded (I only run one time for each case, only thoses looks wired I will give it a second run, this is unfair but save me time). And all the test is performed on my MacBook Pro (some information about my laptop is attached as below).
IMPORTANT NOTES: I noticed that for ALL codes except Python, the output of totoal money will be Inf
when years/term
goes up to 100000
or 1000000
. Python actually return an error. And for some implementations like R version 2 and Python, the output is simplified (only the final amount), this can make the benchmark unfair.
1. Test Perl version
The perl codes from W is wrapped in a file testPerl.pl
, contents as below:
#! usr/bin/perl
use Time::HiRes qw( time );
my $start_time = time();
my ($a,$term,$r) = @ARGV;
printf("saved: %f\tinterest: %8.2f\tfinal: %.2f\n", $a*$_,$i=($t+=$a)*$r, $t+=$i) foreach (1..$term);
my $end_time = time();
my $diff_time = $end_time - $start_time;
printf("Time elapse: %.4f\n", $diff_time);
The test example code for 100 years is perl testPerl.pl 2400 100 0.03
. The time elapsed for each value of yearSpan
is recorded in variable perlCost
, as below:
yearSpan: 100 200 500 1000 5000 10000 1e+05 1e+06
perlCost: 0.0004 0.0022 0.0050 0.0128 0.0866 0.2450 2.0141 13.3940
NOTES: when the $term
goes up to 100000
, the final
value becomes Inf
in the print output.
2. Test C version
I saved W’s C codes in to file testC.c
file, time elapsed calculation is added:
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
int main (int argc, char** argv){
//int calcInt(long payment, long intRate, long term){
clock_t begin = clock();
if (argc < 4){ return 1; }
double payment, intRate, term, yr, total, interest, annInt;
payment = atof(argv[1]); intRate = atof(argv[2]); term = atof(argv[3]);
yr = 1; total = interest = annInt = 0;
for (;yr <= term; yr++){
total += payment;
annInt = total* intRate;
total += annInt;
printf("Year: %2.0f\tSavings: %2.2f\tinterest: %.2f\n", yr, total, annInt);
interest += annInt;
}
printf("Total saved: %.2f\tFinal balance: %.2f\tTotal interest earned: %.2f\n", total-interest, total, interest);
clock_t end = clock();
double time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
printf("Time elapsed: %.4f\n", time_spent);
return 0;
}
Then I compiled it on my mac:
gcc -o testC testC.c
The test example code for 100 years is ./cTest 2400 0.03 100
. The time elapsed for each value of yearSpan
is recorded in variable cCost
as below:
yearSpan: 100 200 500 1000 5000 10000 1e+05 1e+06
cCost: 0.0004 0.0007 0.0017 0.0039 0.0335 0.1075 0.8337 2.8035
NOTES: when the term
goes up to 100000
, the total
value becomes Inf
in the print output.
3. Test R version 1
Test codes for R version 1 is as below:
saveEarn <- function(annualPut, annualInterest, saveYears){
base <- annualPut
for(i in seq_len(saveYears)){
lumpSum <- base*(1 + annualInterest)
cat(paste0(" --Year ", i,
"; total put ", annualPut * i,
"; lump sum:", round(lumpSum, 2),
"; earn:", round(lumpSum-annualPut * i,2), "\n"))
base <- lumpSum + annualPut
i <- i + 1
}
totalPut <- annualPut * saveYears
totalEarn <- lumpSum - totalPut
out <- c(round(totalPut, 2), round(lumpSum,2), round(totalEarn, 2))
names(out) <- c("totalMoneyPut", "totalMoneyGet", "earn")
return(out)
}
yearSpan <- c(100, 200, 500, 1000, 5000, 10000, 100000, 1000000)
R1Cost <- NULL
for(years in yearSpan){
t <- system.time(saveEarn(annualPut=200*12, annualInterest=0.03, saveYears=years))
R1Cost <- c(R1Cost, t[3]) ## record the time elapsed
}
The time elapsed for each value of yearSpan
is recorded in variable R1Cost
as below:
yearSpan: 100 200 500 1000 5000 10000 1e+05 1e+06
R1Cost: 0.002 0.005 0.013 0.027 0.171 0.332 3.702 32.343
NOTES: when the years
goes up to 100000
, the lumpSum
value becomes Inf
in the print output.
4. Test R version 2
Test code for R version 2 is as below:
yearSpan <- c(100, 200, 500, 1000, 5000, 10000, 100000, 1000000)
R2Cost <- NULL
for(years in yearSpan){
annualPut <- 200*12; annualInterest <- 0.03; saveYears <- years
t <- system.time({
earn <- sum(sapply(seq_len(saveYears), function(i){annualPut*(1+annualInterest)**i}))
cat(earn, "\n")
})
R2Cost <- c(R2Cost, t[3]) ## record the time elapsed
}
The time elapsed for each value of yearSpan
is recorded in variable R2Cost
as below:
yearSpan: 100 200 500 1000 5000 10000 1e+05 1e+06
R2Cost: 0.0001 0.0001 0.001 0.002 0.007 0.015 0.179 2.135
NOTES: when the years
goes up to 100000
, the earn
value becomes Inf
in the print output.
5. Test Python Version
I organised B’s python codes in to one file named testPython.py
with contents below:
#!/usr/bin/python
import time
import sys
start = time.time()
interest = 0.03
injection = 2400
years = int(sys.argv[1])
earn = sum(map(lambda x: injection * (1+interest) ** x, range(1,(years+1))))
print earn
end = time.time()
print end - start
Then I tested this with different years
, one example is like python testPython.py 100
. I run this with no problem with years
less than 100000, however when I goes to python testPython.py 100000
, it shows the error below:
$ python testPython.py 1000000
Traceback (most recent call last):
File "testPython.py", line 10, in <module>
earn = sum(map(lambda x: injection * (1+interest) ** x, range(1,(years+1))))
File "testPython.py", line 10, in <lambda>
earn = sum(map(lambda x: injection * (1+interest) ** x, range(1,(years+1))))
OverflowError: (34, 'Result too large')
So unlike Perl or R which give a Inf when the output value is extremely big, Python gives you an error. So I recored the time elapsed as NA
for the failed cases in Python version.
The time elapsed for each value of yearSpan
is recorded in variable pythonCost
as below:
yearSpan: 100 200 500 1000 5000 10000 1e+05 1e+06
pythonCost: 0.001 0.002 0.0001 0.002 0.002 0.004 NA NA
6. Results Summary
So I organize all the time elapsed records into one R data frame as below:
testRes <- data.frame(
yearSpan = c(100, 200, 500, 1000, 5000, 10000, 100000, 1000000),
perlCost = c(0.0004, 0.0022, 0.005, 0.0128, 0.0866, 0.2450, 2.0141, 13.3940),
cCost = c(0.0004, 0.0007, 0.0017, 0.0039, 0.0335, 0.1075, 0.8337, 2.8035),
R1Cost = c(0.002, 0.005, 0.013, 0.027, 0.171, 0.332, 3.702, 32.343 ),
R2Cost = c(0.0001, 0.0001, 0.001, 0.002, 0.007, 0.015, 0.179, 2.135),
pythonCost = c(0.001, 0.002, 0.0001, 0.002, 0.002, 0.004, NA, NA)
)
Since pythonCost
has NA
values and the corresponding total money amount is Inf
, I removed those NA
cases and plotted the time cost using following R codes:
testRes <- data.frame(
yearSpan = c(100, 200, 500, 1000, 5000, 10000, 100000, 1000000),
perlCost = c(0.0004, 0.0022, 0.005, 0.0128, 0.0866, 0.2450, 2.0141, 13.3940),
cCost = c(0.0004, 0.0007, 0.0017, 0.0039, 0.0335, 0.1075, 0.8337, 2.8035),
R1Cost = c(0.002, 0.005, 0.013, 0.027, 0.171, 0.332, 3.702, 32.343 ),
R2Cost = c(0.0001, 0.0001, 0.001, 0.002, 0.007, 0.015, 0.179, 2.135),
pythonCost = c(0.001, 0.002, 0.0001, 0.002, 0.002, 0.004, NA, NA)
)
library(reshape2)
library(ggplot2)
testRes_subset <- testRes[complete.cases(testRes), ] ## remove yearSpan = 100000, 1000000 cases
melt_testRes <- melt(testRes_subset,
id.vars = "yearSpan",
variable.name = "Language",
value.name = "cost")
melt_testRes$yearSpan <- factor(melt_testRes$yearSpan)
ggplot(melt_testRes, aes(x=yearSpan, y=cost, group = Language, colour=Language)) +
geom_point() + geom_line() + theme_bw()
We can observe that my R1 version is the slowest, but R2 version is extremly fast (But R2 only calculate the final amount money). Which shows that a for loop in R is really expensive, using other compacted functionals like apply
family will be much faster. Perl is a little bit better than R1, not so bad! Python perfoms the best, however we need to know it only print the final amount money. Surprisingly, C didn’t win in this case, but it prints a lot more details, this printings lags it behind the R2 and Python.
For a complete view of all tests, I also created a plot with all values:
melt_testRes <- melt(testRes,
id.vars = "yearSpan",
variable.name = "Language",
value.name = "cost")
melt_testRes$yearSpan <- factor(melt_testRes$yearSpan)
ggplot(melt_testRes, aes(x=yearSpan, y=cost, group = Language, colour=Language)) +
geom_point() + geom_line() + theme_bw()
The results doesn’t change much. but once again, the output of R2 and python is not completed as other implementations, this makes them cheating in this benchmark. In a fair case, C will definitely be the winner.