WANtaroHP (F90 STATISTICS)

Outline of this page

Contents
Single Regression Analysis
Multiple Regression Analysis
Principal Component Analysis
Percent point and Probability of exceedance in SND
Probability Ellipse
Cluster Analysis (K-means++ method)
Multi-Dimensional Scaling

Single Regression Analysis

Outline of this program

This is a program for single regression analysis.
Coefficient aa, coefficient bb and a correlation coefficient in regression line y=aa*x+bb can be obtained using this program.

Source code by f90

Filename	Description
f90_KAIKI.txt	program for single regression analysis

Multiple Regression Analysis

Outline of this program

This is a program for multiple regression analysis.
Regression formula is y=b₀+b₁*x₁+b₂*x₂+b₃*x₃+....
where, y is response variable, b_i are partial regression coefficients and x_i are explanatory variables.
Simultaneous linear equations are solved using simple Gauss-Jordan elimination.

Format of input data file

 001 | Sample data
 002 | 4,46
 003 | 44523,6364,90547,19.7,640.9
 004 | 10395,7135,12634,20.0,128.2
 005 |  8669,7266,20101,16.3,113.8
..........

001: Comment
002: Number of explanatory variables (=4), number of input data sets (=46)
003: (In this case, y-value and four x-values)
004: (Lower rows from row 3 are y and x-values of data)

Bach command for execution

gfortran -o f90_MRA.exe f90_MRA.f90
f90_MRA inp_MRA.txt out_MRA.txt

Format of execution command

f90_MRA fnameR fnameW

f90_MRA	Execution file of f90
fnameR	Input file name
fnameW	Out put faile name

Source code by f90

Filename	Description
f90_MRA.txt	Program for multiple regression analysis
inp_MRA.txt	Sample input data
out_MRA.txt	Sample output data
R_DRAWinp.txt	R script for drawing of input data
R_DRAWout.txt	R script for drawing of output data
fig_MRA_inp.pdf	Sample image of input data
fig_MRA_out.pdf	Sample image of output data

Principal Component Analysis

Outline of this program

This is a program for principal component analysis.
Eigenvalues and eigenvectors are obtained using Jacobi eigenvalue algorithm.

Format of input data file

 001 | iris
 002 | 4,150
 003 | 5.1,3.5,1.4,0.2,setosa,1
 004 | 4.9,3.0,1.4,0.2,setosa,2
 005 | 4.7,3.2,1.3,0.2,setosa,3
..........

001: Comment
002: Number of variables (=4), number of input data sets (=150).
003: (In this case, four values, No.5 and 6 columns are not available.)
004: (Lower rows from row 3 are values of data)

Bach command for execution

gfortran -o f90_PCA.exe f90_PCA.f90

f90_PCA 1 1 inp_iris.csv out_iris.csv

rscript R_DRAWinp.R
ren Rplots.pdf fig_R_PCA_inp.pdf

rscript R_DRAWout.R
ren Rplots.pdf fig_R_PCA_out.pdf

Format of execution command

f90_PCA knor kvok fnameR fnameW

f90_PCA	Execution file of f90
knor	Index for normalization (0: yes, 1: No)
kvok	Index for Var. or Cor. (0: use variance-covariance matrix, 1: use correlation matrix)
fnameR	Input file name
fnameW	Out put faile name

Source code by f90

Filename	Description
f90_PCA.txt	Program for principal component analysis
inp_iris.txt	Sample input data
out_iris.txt	Sample output data
R_DRAWinp.txt	R script for drawing of input data
R_DRAWout.txt	R script for drawing of output data
fig_PCA_inp.pdf	Sample image of input data
fig_PCA_out.pdf	Sample image of output data

Percent point and Probability of exceedance in Standard Normal Distribution

Outline of this program

This is a program to give a percent point and probability of exceedance in Standard Normal Distribution.
To obtain the percent point, approximation method by TODA is used.
To obtain the probability of exceedance, approximation method by SHENTON is used.

Source code by f90

Filename	Description
f90_TODAPP.txt	program for calculation of percent point and exceedance probability
inp_TODAPP.txt	Sample input data
out_TODAPP.txt	Sample output data

Probability Ellipse

Outline of this program

Two programs named f90_SREG.f90 and f90_ELLIP.f90 are introduced in this section.
The program f90_SREG.f90 is for Single regression analysis and finding the characteristics of the probability ellipse.
The program f90_ELLIP.f90 is for creating input data for 'Se' option in GMT. 'Se' is an option of 'psxy' command in GMT, and it can draw an ellipse without many plot values. In this program, the formula of ellipse which is derived using given three points is conducted, and the rotated angle, major and minor axis of the ellipse are finded by solving the non-linear simultaneous equations.

Bach command for execution

gfortran -o f90_SREG.exe f90_SREG.f90
gfortran -o f90_ELLIP.exe f90_ELLIP.f90

f90_SREG inp_reg1.txt 0.95 > out_reg1.txt
gawk "15<=NR{print $2,$3}" out_reg1.txt > inp_elp1.txt
f90_ELLIP inp_elp1.txt 5 5 5 3 > _elp11.txt
f90_ELLIP inp_elp1.txt 5 5 5 5 > _elp12.txt

The format of command line arguments are shown below.

f90_SREG fnameR pp > fnameW

f90_SREG	Compiled F90 frogram for finding the characteristics of the probability ellipse
fnameR	Input file name
pp	Value of non-exceedance probability for the probability ellipse
fnameW	Output file name

f90_ELLIP fnameR LX LY dx,dy > fnameW

f90_ELLIP	Compiled F90 frogram for creating the input data for GMT
fnameR	Input file name
LX	Length of x-axis of the graph in the unit of 'cm'
LY	Length of y-axis of the graph in the unit of 'cm'
dx	Range of x-axis (maximum x-value minus minimum x-value on the x-axis of the graph)
dy	Range of y-axis (maximum y-value minus minimum y-value on the y-axis of the graph)
fnameW	Output file name

Source code by f90

Filename	Description
f90_SREG.txt	Program for single regression and probability ellipse
f90_ELLIP.txt	Program for creating data for GMT drawing
inp_reg1.txt	Input data sample
en_TeX_ellipse.pdf	Document of theory and output sample

Cluster Analysis (K-means++ method)

Outline of this program

This is a program for cluster analysis.
K-means++ method is used as a solution for cluster analysis.
Algorithm to take k number's initial centers is shown below:

(0) Set the number of cluster k and prepare 3 variables as vectors {D₁},{D₂} and {D_R} in order to keep calculated distances.
(1) Take 1st center point as c₁ from data set using random number.
(2) Put the dummy values into vector {D₁}, which has huge figures.
(3) Calculate the distance between center point c₁ and all points, and put them into the vector {D₂}.
(4) Compare the element in {D₁} with the element in {D₂} and set a smaller value into {D_R}.
(5) Set 2nd center point as c₂ which has maximum value in {D_R}.
(6) Set {D₁}={D_R} and repeat the procedure from (3) to (6) until getting k number's centers.

Format of input data file

Format of input data is the same as it for principal component analysis.

 001 | iris
 002 | 4,150
 003 | 5.1,3.5,1.4,0.2,setosa,1
 004 | 4.9,3.0,1.4,0.2,setosa,2
 005 | 4.7,3.2,1.3,0.2,setosa,3
..........

001: Comment
002: Number of variables (=4), number of input data sets (=150).
003: (In this case, four values, No.5 and 6 columns are not available.)
004: (Lower rows from row 3 are values of data)

Bach command for execution

gfortran -o f90_KMEANSPP.exe f90_KMEANSPP.f90

f90_KMEANSPP 0 3 30 inp_iris.csv out_iris0.txt
f90_KMEANSPP 2 3 30 inp_iris.csv out_iris2.txt

del fig_R_CLUSinp.pdf
del fig_R_CLUSpca_0.pdf
del fig_R_CLUSpca_2.pdf
del fig_R_CLUSmds_0.pdf
del fig_R_CLUSmds_2.pdf

rscript R_CLUSinp.R inp_iris.csv
ren Rplots.pdf fig_R_CLUSinp.pdf

rscript R_CLUSpca.R out_iris0.txt
ren Rplots.pdf fig_R_CLUSpca_0.pdf
rscript R_CLUSpca.R out_iris2.txt
ren Rplots.pdf fig_R_CLUSpca_2.pdf

rscript R_CLUSmds.R out_iris0.txt
ren Rplots.pdf fig_R_CLUSmds_0.pdf
rscript R_CLUSmds.R out_iris2.txt
ren Rplots.pdf fig_R_CLUSmds_2.pdf

Format of execution command

f90_KMEANSPP imethod kk mds fnameR fnameW

f90_KMEANPP	Execution file of f90
imethod	calculation method of distance (0: mahalanobis, 1: City block, 2: Euclidean, >=3: Minkowski)
kk	Number of cluster
mds	Minimum size of one cluster
fnameR	Input file name
fnameW	Out put faile name

Source code by f90

Filename	Description
f90_KMEANSPP.txt	Program for cluster analysis
inp_iris.txt	Sample input data
out_iris0.txt	Sample output data (Mahalanobis)
out_iris2.txt	Sample output data (Euclidean)
R_CLUSinp.txt	R script for drawing using input data
R_CLUSpca.txt	R script for drawing using PCA
R_CLUSmds.txt	R script for drawing using MDS
fig_R_CLUSinp.pdf	Sample image (input data)
fig_R_CLUSpca_0.pdf	Sample image (Mahalanobis,PCA)
fig_R_CLUSpca_2.pdf	Sample image (Euclidean, PCA)
fig_R_CLUSmds_0.pdf	Sample image (Mahalanobis,MDS)
fig_R_CLUSmds_2.pdf	Sample image (Euclidean, MDS)

Multi-Dimensional Scaling

Outline of this program

This is a program for classical Multi-Dimensional Scaling.
The procedure of analysis is shown below:

(0) Observed vector data with dimension n x m can be inputed. In this case, distance matrix with dimension n x n shall be calculated before following procedure.
(1) Input a distance matrix (proximity matrix) [d] with dimension n x n.
(2) Make squared matrix [D], its element is squared value of the element of [d].
(3) Make matrix [B] from a matrix [D] using Young - Householder transformation.
where, [B]=-0.5*[J][D][J], centering matrix [J]=[I]-1/n*{1}{1}^T
{1}={1, 1, 1, ... ,1, 1}^T
(4) Obtain the eigenvalues {λ_i} (λ₁>=λ₂>=λ₃>= ... >=λ_n) and eigen vectors [E] of matrix [B].
(5) Obtain the coordinate matrix [X_m]=[E_m][Λ_m]^1/2.
where, [Λ_m] is the diagonal matrix of eigenvalues, and the element λ_{i=1 to m} in the matrix [Λ_m] shall be more than or equal to zero.
(6) Make plots of obtained coordinates.

Format of input data file

Vector data input (inp_iris.csv)

Format of input data is the same as it for principal component analysis.

 001 | iris
 002 | 4,150
 003 | 5.1,3.5,1.4,0.2,setosa,1
 004 | 4.9,3.0,1.4,0.2,setosa,2
 005 | 4.7,3.2,1.3,0.2,setosa,3
..........

001: Comment
002: Number of variables (=4), number of input data sets (=150).
003: (In this case, four values, No.5 and 6 columns are not available.)
004: (Lower rows from row 3 are values of data)

Distance matrix input (inp_map.csv)

 001 | Distance
 002 | 9
 003 | KL       , 0
 004 | Tokyo    , 5332.837,  0
 005 | Jakarta  , 1184.231,  5774.036,  0
 006 | Bangkok  , 1179.106,  4605.718,  2308.805,  0
 007 | Vientiane, 1644.533,  4151.984,  2710.010,  513.445,   0
 008 | Ha_Noi   , 2033.150,  3679.648,  3010.784,  979.078,   480.029,  0
 009 | Seoul    , 4612.781,  1162.077,  5275.036, 3715.257,  3223.101,  2743.357,   0
 010 | Beijing  , 4339.820,  2113.622,  5197.282, 3284.870,  2771.456,  2321.983,   964.494,  0
 011 | Maynila  , 2480.550,  2997.510,  2776.534, 2203.659,  2000.776,  1757.266,  2614.778,  2846.152, 0

001: Comment
002: Number of items (=9)
003-011: Distance between capital cities

Bach command for execution

Example for vector data input (inp_iris.csv)

gfortran -o f90_MDS.exe f90_MDS.f90

f90_MDS 1 inp_iris.csv out_MDS_iris.csv

del fig_R_MDS_inp.pdf
del fig_R_MDS_out.pdf
rscript R_DRAWinp.R
ren Rplots.pdf fig_R_MDS_inp.pdf
rscript R_DRAWout_1.R
ren Rplots.pdf fig_R_MDS_out.pdf

Example for distance matrix input (inp_map.csv)

gfortran -o f90_MDS.exe f90_MDS.f90

f90_MDS 0 inp_map.csv out_MDS_map.csv

del fig_R_MDS_map.pdf
rscript R_DRAWout_0.R
ren Rplots.pdf fig_R_MDS_map.pdf

Format of execution command

f90_MDS knor fnameR fnameW

f90_MDS	Execution file of f90
knor	treatment of data (0: distance matrix input, 1: normalization of vector data, 2: no-normalization of vector data
fnameR	Input file name
fnameW	Out put faile name

Source code by f90

Filename	Description
f90_MDS.txt	Program for MDS
inp_iris.txt	Sample input data (vector data of 'iris')
out_MDS_iris.txt	Sample output data for 'iris'
inp_map.txt	Sample inpur data (distance matrix)
out_MDS_map.txt	Sample output data for 'map'
R_DRAWinp.txt	R script for drawing using input vector data
R_DRAWout_0.txt	R script for drawing of map
R_DRAWout_1.txt	R script for drawing of vector data
fig_R_MDS_inp.pdf	Sample image (input vector data)
fig_R_MDS_out.pdf	Sample image (output for vector data)
fig_R_MDS_map.pdf	Sample image (2-dim. map). Refer to below drawing.
fig_G_Malay.png	2-dim.map by GMT (centering Kuala Lumpur)