继续阅读Performance Characterization of Mobile GP-GPUs
Performance Characterization of Mobile GP-GPUs
继续阅读Performance Characterization of Mobile GP-GPUs
目前 ( 2019/04/24 ),在 macOS Mojave
(10.14.4
)系统上使用 brew install octave
,安装 Octave 5.1.0
之后,使用 pause()
函数无法在点击键盘之后继续执行,除了 Ctrl + C
之外任意键都不响应。正常情况下,点击任意按键之后,应该继续执行后续的代码。
这个是目前使用 brew
安装的 Octave 5.1.0
在编译的时候,关联的库是 glibc 2.28
之后的版本。这个版本上 glibc 2.28
的某些行为发生变动。具体的讨论信息,参考 bug #55029: pause() with no arguments does not return like kbhit() with glibc 2.28 上的讨论。本质就是 glibc 2.28
之后的版本要求应用程序在接收信息结束( EOF
)之后,主动调用 clearerr (stdin);
,否则会收不到后续的按键通知。这个 BUG
在 Octave 5.2
版本被修复,但是这个版本何时发布,暂时不定。
目前的修复方式为要求 brew
从最新版本的代码编译安装,而不是安装已发布版本,如下:
1 2 3 4 5 6 7 8 9 10 |
$ brew uninstall --ignore-dependencies octave ? # 安装编译依赖 $ brew install texinfo ? $ wget https://raw.githubusercontent.com/Homebrew/homebrew-core/master/Formula/octave.rb ? $ sed -i "" "s/\"--enable-shared\"/\"--enable-shared\",\"--disable-docs\"/g" octave.rb ? $ brew install --build-from-source --HEAD -v octave.rb |
修改下载的编译配置文件,并且关闭文档编译( 目前文档编译会失败),也就是增加? --disable-docs
这个编译参数。
调整之后的编译脚本如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 |
class Octave < Formula ??desc "High-level interpreted language for numerical computing" ??homepage "https://www.gnu.org/software/octave/index.html" ??url "https://ftp.gnu.org/gnu/octave/octave-5.1.0.tar.xz" ??mirror "https://ftpmirror.gnu.org/octave/octave-5.1.0.tar.xz" ??sha256 "87b4df6dfa28b1f8028f69659f7a1cabd50adfb81e1e02212ff22c863a29454e" ??revision 2 ? ??bottle do ????sha256 "6bb8497839d6f7872efcd6acad0216f443420e097a9b7fad44835823e1c0e735" => :mojave ????sha256 "d1de53a30f002d8b7ec3a6065994c46d8cbd4830aa7e199f572baff48723c6e6" => :high_sierra ????sha256 "7a648cff129ec85a5ee9417a0339a3b804756f7958585b707c015d322d220b15" => :sierra ??end ? ??head do ????url "https://hg.savannah.gnu.org/hgweb/octave", :branch => "default", :using => :hg ? ????depends_on "autoconf" => :build ????depends_on "automake" => :build ????depends_on "bison" => :build ????depends_on "icoutils" => :build ????depends_on "librsvg" => :build ??end ? ??# Complete list of dependencies at https://wiki.octave.org/Building ??depends_on "gnu-sed" => :build # https://lists.gnu.org/archive/html/octave-maintainers/2016-09/msg00193.html ??depends_on :java => ["1.6+", :build] ??depends_on "pkg-config" => :build ??depends_on "arpack" ??depends_on "epstool" ??depends_on "fftw" ??depends_on "fig2dev" ??depends_on "fltk" ??depends_on "fontconfig" ??depends_on "freetype" ??depends_on "gcc" # for gfortran ??depends_on "ghostscript" ??depends_on "gl2ps" ??depends_on "glpk" ??depends_on "gnuplot" ??depends_on "graphicsmagick" ??depends_on "hdf5" ??depends_on "libsndfile" ??depends_on "libtool" ??depends_on "pcre" ??depends_on "portaudio" ??depends_on "pstoedit" ??depends_on "qhull" ??depends_on "qrupdate" ??depends_on "qt" ??depends_on "readline" ??depends_on "suite-sparse" ??depends_on "sundials" ??depends_on "texinfo" ??depends_on "veclibfort" ? ??# Dependencies use Fortran, leading to spurious messages about GCC ??cxxstdlib_check :skip ? ??def install ????# Default configuration passes all linker flags to mkoctfile, to be ????# inserted into every oct/mex build. This is unnecessary and can cause ????# cause linking problems. ????inreplace "src/mkoctfile.in.cc", ??????????????/%OCTAVE_CONF_OCT(AVE)?_LINK_(DEPS|OPTS)%/, ??????????????'""' ? ????# Qt 5.12 compatibility ????# https://savannah.gnu.org/bugs/?55187 ????ENV["QCOLLECTIONGENERATOR"] = "qhelpgenerator" ????# These "shouldn't" be necessary, but the build breaks without them. ????# https://savannah.gnu.org/bugs/?55883 ????ENV["QT_CPPFLAGS"]="-I#{Formula["qt"].opt_include}" ????ENV.append "CPPFLAGS", "-I#{Formula["qt"].opt_include}" ????ENV["QT_LDFLAGS"]="-F#{Formula["qt"].opt_lib}" ????ENV.append "LDFLAGS", "-F#{Formula["qt"].opt_lib}" ? ????system "./bootstrap" if build.head? ????system "./configure", "--prefix=#{prefix}", ??????????????????????????"--disable-dependency-tracking", ??????????????????????????"--disable-silent-rules", ??????????????????????????"--enable-link-all-dependencies", ??????????????????????????"--enable-shared","--disable-docs", ??????????????????????????"--disable-static", ??????????????????????????"--with-hdf5-includedir=#{Formula["hdf5"].opt_include}", ??????????????????????????"--with-hdf5-libdir=#{Formula["hdf5"].opt_lib}", ??????????????????????????"--with-x=no", ??????????????????????????"--with-blas=-L#{Formula["veclibfort"].opt_lib} -lvecLibFort", ??????????????????????????"--with-portaudio", ??????????????????????????"--with-sndfile" ????system "make", "all" ? ????# Avoid revision bumps whenever fftw's or gcc's Cellar paths change ????inreplace "src/mkoctfile.cc" do |s| ??????s.gsub! Formula["fftw"].prefix.realpath, Formula["fftw"].opt_prefix ??????s.gsub! Formula["gcc"].prefix.realpath, Formula["gcc"].opt_prefix ????end ? ????# Make sure that Octave uses the modern texinfo at run time ????rcfile = buildpath/"scripts/startup/site-rcfile" ????rcfile.append_lines "makeinfo_program(\"#{Formula["texinfo"].opt_bin}/makeinfo\");" ? ????system "make", "install" ??end ? ??test do ????system bin/"octave", "--eval", "(22/7 - pi)/pi" ????# This is supposed to crash octave if there is a problem with veclibfort ????system bin/"octave", "--eval", "single ([1+i 2+i 3+i]) * single ([ 4+i ; 5+i ; 6+i])" ??end end |
This is the sequel of the single precision SSE optimized sin, cos, log and exp that I wrote some time ago. Adapted to the NEON fpu of my pandaboard. Precision and range are exactly the same than the SSE version, so I won't repeat them.
command line: gcc -O3 -mfloat-abi=softfp -mfpu=neon -march=armv7-a -mtune=cortex-a9 -Wall -W neon_mathfun_test.c -lm
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
exp([????????-1000,??????????-100,?????????? 100,??????????1000]) = [????????????0,???????????? 0, 2.4061436e+38, 2.4061436e+38] exp([???????? -nan,?????????? inf,??????????-inf,?????????? nan]) = [??????????nan, 2.4061436e+38,???????????? 0,?????????? nan] log([????????????0,?????????? -10,???????? 1e+30, 1.0005271e-42]) = [???????? -nan,??????????-nan,???? 69.077553,??????????-nan] log([???????? -nan,?????????? inf,??????????-inf,?????????? nan]) = [????89.128304,???? 88.722839,??????????-nan,???? 89.128304] sin([???????? -nan,?????????? inf,??????????-inf,?????????? nan]) = [??????????nan,?????????? nan,??????????-nan,?????????? nan] cos([???????? -nan,?????????? inf,??????????-inf,?????????? nan]) = [??????????nan,?????????? nan,?????????? nan,?????????? nan] sin([?????? -1e+30,?????? -100000,???????? 1e+30,????????100000]) = [??????????inf,??-0.035749275,??????????-inf,?? 0.035749275] cos([?????? -1e+30,?????? -100000,???????? 1e+30,????????100000]) = [??????????nan,????-0.9993608,?????????? nan,????-0.9993608] benching???????????????? sinf .. ->????2.0 millions of vector evaluations/second -> 121 cycles/value on a 1000MHz computer benching???????????????? cosf .. ->????1.8 millions of vector evaluations/second -> 132 cycles/value on a 1000MHz computer benching???????????????? expf .. ->????1.1 millions of vector evaluations/second -> 221 cycles/value on a 1000MHz computer benching???????????????? logf .. ->????1.7 millions of vector evaluations/second -> 141 cycles/value on a 1000MHz computer benching??????????cephes_sinf .. ->????2.4 millions of vector evaluations/second -> 103 cycles/value on a 1000MHz computer benching??????????cephes_cosf .. ->????2.0 millions of vector evaluations/second -> 123 cycles/value on a 1000MHz computer benching??????????cephes_expf .. ->????1.6 millions of vector evaluations/second -> 153 cycles/value on a 1000MHz computer benching??????????cephes_logf .. ->????1.5 millions of vector evaluations/second -> 156 cycles/value on a 1000MHz computer benching?????????????? sin_ps .. ->????5.8 millions of vector evaluations/second ->??43 cycles/value on a 1000MHz computer benching?????????????? cos_ps .. ->????5.9 millions of vector evaluations/second ->??42 cycles/value on a 1000MHz computer benching????????????sincos_ps .. ->????6.0 millions of vector evaluations/second ->??41 cycles/value on a 1000MHz computer benching?????????????? exp_ps .. ->????5.6 millions of vector evaluations/second ->??44 cycles/value on a 1000MHz computer benching?????????????? log_ps .. ->????5.3 millions of vector evaluations/second ->??47 cycles/value on a 1000MHz computer |
365棋牌上下分银商365淘宝棋牌365棋牌 金币So performance is not stellar. I recommend to use gcc 4.6.1 or newer as it generates much better code than previous (gcc 4.5) versions -- almost 20% faster here. I believe rewriting these functions in assembly would improve the performance by 30%, and should not be very hard as the ARM and NEON asm is quite nice and easy to write -- maybe I'll do it. Computing two SIMD vectors at once would also help to improve a lot the performance as there are enough registers on NEON, and it would reduce the dependancies between neon instructions.
Note also that I have no idea of the performance on a Cortex A8 -- it may be extremely bad, I don't know.
command line: cl.exe /arch:SSE /O2 /TP /MD sse_mathfun_test.c (this is msvc 2010)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
benching???????????????? sinf .. ->????1.3 millions of vector evaluations/second -> 303 cycles/value on a 1600MHz computer benching???????????????? cosf .. ->????1.3 millions of vector evaluations/second -> 305 cycles/value on a 1600MHz computer benching???????? sincos (x87) .. ->????1.2 millions of vector evaluations/second -> 314 cycles/value on a 1600MHz computer benching???????????????? expf .. ->????1.6 millions of vector evaluations/second -> 244 cycles/value on a 1600MHz computer benching???????????????? logf .. ->????1.4 millions of vector evaluations/second -> 276 cycles/value on a 1600MHz computer benching??????????cephes_sinf .. ->????1.4 millions of vector evaluations/second -> 280 cycles/value on a 1600MHz computer benching??????????cephes_cosf .. ->????1.5 millions of vector evaluations/second -> 265 cycles/value on a 1600MHz computer benching??????????cephes_expf .. ->????0.7 millions of vector evaluations/second -> 548 cycles/value on a 1600MHz computer benching??????????cephes_logf .. ->????0.8 millions of vector evaluations/second -> 489 cycles/value on a 1600MHz computer benching?????????????? sin_ps .. ->????9.2 millions of vector evaluations/second ->??43 cycles/value on a 1600MHz computer benching?????????????? cos_ps .. ->????9.5 millions of vector evaluations/second ->??42 cycles/value on a 1600MHz computer benching????????????sincos_ps .. ->????8.8 millions of vector evaluations/second ->??45 cycles/value on a 1600MHz computer benching?????????????? exp_ps .. ->????9.8 millions of vector evaluations/second ->??41 cycles/value on a 1600MHz computer benching?????????????? log_ps .. ->????8.6 millions of vector evaluations/second ->??46 cycles/value on a 1600MHz computer |
有时需要用Matlab
调试某些C
语言开发的函数库,需要在Matlab
里面查看执行效果。
整个的参考例子如下:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 |
#include ? // Check if some command is really some givent one static bool commandIs(const mxArray* mxCommand, const char* command) { ????double result; ????mxArray* plhs1[1]; ????mxArray* prhs1[1]; ????mxArray* plhs2[1];?? ????mxArray* prhs2[2]; ? ????if (mxCommand == NULL) { mexErrMsgTxt("'mxCommand' is null"); return false; } ????if (command == NULL) { mexErrMsgTxt("'command' is null"); return false; } ????if (!mxIsChar(mxCommand)) { mexErrMsgTxt("'mxCommand' is not a string"); return false; } ? ????// First trim ????prhs1[0] = (mxArray*)mxCommand; ????mexCallMATLAB(1, plhs1, 1, prhs1, "strtrim"); ? ????// Then compare ????prhs2[0] = mxCreateString(command); ????prhs2[1] = plhs1[0]; ????mexCallMATLAB(1, plhs2, 2, prhs2, "strcmpi"); ? ????// Return comparison result ????result = mxGetScalar(plhs2[0]);?? ????return (result != 0.0); } ? static void processHelpMessageCommand(void) { ????mexPrintf("DspMgr('init') init return Handle,return nil if failed. use 'release' free memory\n"); ????mexPrintf("DspMgr('release',handle) free memory\n");???? } ? static void processInitCommand(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) {???????? ????char* example_buffer = malloc(512); ????plhs[0] = mxCreateNumericMatrix(1,1,mxUINT64_CLASS,mxREAL); ????long long *ip = (long long *) mxGetData(plhs[0]); ????*ip = (long long)example_buffer; } ? static void processReleaseCommand(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { ????if(nrhs != 2) { ????????mexErrMsgTxt("release need 1 params"); ????} else { ????????if(!mxIsUint64(prhs[1])) { ?????????? mexErrMsgTxt("release handle must be UINT64 format"); ?????????? return; ????????} ???????? ????????int M=mxGetM(prhs[1]); //获得矩阵的行数 ????????int N=mxGetN(prhs[1]);??//获得矩阵的列数 ????????if((1 != M) &&(1 != N)) { ?????????? mexErrMsgTxt("release handle must be 1*1 array format"); ?????????? return; ????????} ???????? ????????long long ip = mxGetScalar(prhs[1]); ????????char* example_buffer = (char*)ip; ????????free(example_buffer); ???????? ????????//return true avoid warnning ????????plhs[0] = mxCreateNumericMatrix(1,1,mxINT8_CLASS,mxREAL); ????????char* mx_data = (char *) mxGetData(plhs[0]); ????????mx_data[0] = 1; ????}???? } ? // Mex entry point void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[]) { ????// Arguments parsing ????if (nrhs < 1) { mexErrMsgTxt("Not enough input arguments. use 'DspMgr help' for help message."); return; } ????if (!mxIsChar(prhs[0])) { mexErrMsgTxt("First parameter must be a string."); return; } ? ????// Command selection ????if (commandIs(prhs[0], "HELP")) { processHelpMessageCommand(); } ????else if (commandIs(prhs[0], "init")) { processInitCommand(nlhs, plhs, nrhs, prhs); } ????else if (commandIs(prhs[0], "release")) { processReleaseCommand(nlhs, plhs, nrhs, prhs); } ????else { mexErrMsgTxt("Unknown command or command not implemented yet."); } } |
尤其注意上面例子里我们如何隐藏一个C
里申请的指针并传递给Matlab
。
Matlab
的调用例子如下:
1 2 3 4 5 6 |
mex -output DspMgr 'CFLAGS="\$CFLAGS -std=c99"' '*.c' ? v = DspMgr('init') ? DspMgr('release',v) ? |
泰勒公式是将一个在x=x0处具有n阶导数的函数f(x)利用关于(x-x0)的n次多项式来逼近函数的方法。
若函数f(x)在包含x0的某个闭区间[a,b]上具有n阶导数,且在开区间(a,b)上具有(n+1)阶导数,则对闭区间[a,b]上任意一点x,成立下式:
其中,表示f(x)的n阶导数,等号后的多项式称为函数f(x)在x0处的泰勒展开式,剩余的Rn(x)是泰勒公式的余项,是(x-x0)n的高阶无穷小。
这里需要注意的是,我们规定0的阶乘 "?0!=1 "。
卡尔曼滤波原论文 A New Approach to Linear Filtering and Prediction Problems
继续阅读卡尔曼滤波原论文 A New Approach to Linear Filtering and Prediction Problems
下载Word文档 高斯函数
?
希腊字母表
|
||||||
序号
|
大写
|
小写
|
英文注音
|
国际音标注音
|
中文读音
|
意义
|
1
|
Α
|
α
|
alpha
|
a:lf
|
阿尔法
|
角度;系数
|
2
|
Β
|
β
|
beta
|
bet
|
贝塔
|
磁通系数;角度;系数
|
3
|
Γ
|
γ
|
gamma
|
ga:m
|
伽马
|
电导系数(小写)
|
4
|
Δ
|
δ
|
delta
|
delt
|
德尔塔
|
变动;密度;屈光度
|
5
|
Ε
|
ε
|
epsilon
|
ep
silon |
艾普西龙
|
对数之基数
|
6
|
Ζ
|
ζ
|
zeta
|
zat
|
截塔
|
系数;方位角;阻抗;相对粘度;原子序数
|
7
|
Η
|
η
|
eta
|
eit
|
艾塔
|
磁滞系数;效率(小写)
|
8
|
Θ
|
θ
|
thet
|
θit
|
西塔
|
温度;相位角
|
9
|
Ι
|
ι
|
iot
|
aiot
|
约塔
|
微小,一点儿
|
10
|
Κ
|
κ
|
kappa
|
kap
|
卡帕
|
介质常数
|
11
|
Λ
|
λ
|
lambda
|
lambd
|
兰布达
|
波长(小写);体积
|
12
|
Μ
|
μ
|
mu
|
mju
|
缪
|
磁导系数微(千分之一)放大因数(小写)
|
13
|
Ν
|
ν
|
nu
|
nju
|
纽
|
磁阻系数
|
14
|
Ξ
|
ξ
|
xi
|
ksi
|
克西
|
数学上的随机变量
|
15
|
Ο
|
ο
|
omicron
|
omikron
|
奥密克戎
|
|
16
|
Π
|
π
|
pi
|
pai
|
派
|
圆周率=圆周÷直径=3.14159 26535 89793
|
17
|
Ρ
|
ρ
|
rho
|
rou
|
肉
|
电阻系数(小写)
|
18
|
Σ
|
σ
|
sigma
|
sigma |
西格马
|
总和(大写),表面密度;跨导(小写)
|
19
|
Τ
|
τ
|
tau
|
tau
|
套
|
时间常数
|
20
|
Υ
|
υ
|
upsilon
|
jupsilon
|
伊普西龙
|
位移
|
21
|
Φ
|
φ
|
phi
|
fai
|
佛爱
|
磁通;角
|
22
|
Χ
|
χ
|
chi
|
phai
|
西
|
|
23
|
Ψ
|
ψ
|
psi
|
psai
|
普西
|
角速;介质电通量(静电力线);角
|
24
|
Ω
|
ω
|
omega
|
o`miga
|
欧米伽
|
欧姆(大写);角速(小写);角
|