Firstly, let us see how to map the data into intervals. There is a function "floor(x)" which return the largest integer not greater than its argument. So function floor(x/dx)*dx will map x into one of the intervals [-n*dx:-(n-1)*dx],[-(n-1)*dx:-(n-2)*dx]...[(n-1)*dx:n*dx].

Now we come to count the data number in each interval. In gnuplot there is a smooth option called "frequency". It makes the data monotonic in x. Points with the same x-value are replaced by a single point having the summed y-values. Using this property, we can count the data numbers in the intervals.

At last we plot our result using boxes plot style.

The main idea have introduced. It is time to write the plotting script.

reset n=100 #number of intervals max=3. #max value min=-3. #min value width=(max-min)/n #interval width #function used to map a value to the intervals hist(x,width)=width*floor(x/width)+width/2.0 set term png #output terminal and file set output "histogram.png" set xrange [min:max] set yrange [0:] #to put an empty boundary around the #data inside an autoscaled graph. set offset graph 0.05,0.05,0.05,0.0 set xtics min,(max-min)/5,max set boxwidth width*0.9 set style fill solid 0.5 #fillstyle set tics out nomirror set xlabel "x" set ylabel "Frequency" #count and plot plot "data.dat" u (hist($1,width)):(1.0) smooth freq w boxes lc rgb"green" notitle

statistic histogram plotting using gnuplot |

you are my hero!

ReplyDeletehi, i tried the same thing using gnuplot but it says "undefined variable: graph"

ReplyDeletethen i still continue with the plot and says "all points y value undefined"

thanks.

"all points y value indefined" means that all your y points are out of your yrange, you have to set it in order to have them in it...

DeleteI just had the same mistake. I forgot to set the datafile delimieter ��

DeleteHi,Callisto:

ReplyDelete1.The script runs well on my computer, I have just confirmed about it. So the first question may be caused by your mistyping.

2."all points y value undefined" may been caused by the gnuplot can not find the data file. So have you put the file data.dat under the working directory?

You may copy the script to a file (for example, plot.gplt), and then copy it and the data file (data.dat) to your working directory. After these are done, run command "load 'plot.gplt'" using gnuplot.

i managed to figure it out, just had to remove the word "graph". :)

ReplyDeleteWould you be able to tell me how to fit a gaussian curve onto the histogram? thank you.

Hi,Callisto:

ReplyDeleteIt is a bit hard to fit a Gaussian curve in this problem only using gnuplot, since gnuplot is designed as a plot tool, not a data processing software. Tricks played, the goal may be achieved. May be I will talk about how to do it in a future post.

Now I advice using data processing software to process the data at first. Getting the fitted curve and then plot it on the graph.

I'm surprised that you can create so many beautiful plots with Gnuplot using a lot of features, but you do not know the "fit" command.

DeleteI see that this comment is quite old and most probably (if you looked after) you found already that fitting in Gnuplot is actually very simple.

It is worth a try.

Really cool thing! I never thought that gnuplot could do something like that and it's exactly what I wanted to do. Just a little question is it possible to fit a function (in this case a gaussian) to this histogram?

ReplyDeleteIn any case thanks a lot!

Anonymous:

ReplyDeleteIt is possible to use "set table " to export the data to a data file. And then use "fit" command to fit a curve.

Thank you so much for your fast answer! I was trying since two hours... Now finally I have a really beautiful graph :) I love gnuplot and your blog!

ReplyDeleteGreetings from Lyon, yours Daniel

Brilliant!

ReplyDeleteHi there,

ReplyDeleteFirst of all, thank you for this blog! I'm trying to make a histogram using the same script that you provided above. the only difference is that the data doesnt seem to be accumulating. Although I have one set of data, it seems to plot 4 different histograms.

Here is what it looks life

https://docs.google.com/document/d/1DLor564g7o-wYYB6vg3arAQRt7d2C9M5E7-h-EnQaoQ/edit

Is there any reason why this is so ?

The only difference in my script is that I have normalised the distribution by changing

u (hist($1,width)):(1.0)

to

u (hist($1,width)):(1.0/(N*width))

where N = number of data points

Help?

Thanks in advance!

R

Ray2.0:

ReplyDeleteThe most possible reason is that there are some blank lines in your data file. Examine your data file and delete the lines, and then have a try again. Hope your success！

Hi there!

ReplyDeleteThanks so much for the reply! You are right. My data file is also 500 000 lines and there were some nans in there. I have another point of query however! Do you know how to plot 3d histograms? I saw an image of this online: a 3d histogram with projections on the different sides of the plot.

I hope this makes sense!

thanks in advance =)

R

Ray2.0:

ReplyDeleteA 3-d histogram is always not necessary and not suggested.

For example, this graph (http://www.photobiology.com/v1/maragoni/img13.jpg) is indeed really a bad one, since the bars shade each other, so that the reader can not get the information the graph is intended to give. And this kind of graph is always suggested to plot as a heatmap (http://flowingdata.com/wp-content/uploads/yapb_cache/nba_heatmap_revised.7sjutbstqyw40kw4o08og084k.2xne1totli0w8s8k0o44cs0wc.th.png).

And for a 3-d histogram like this one (http://cqisignals.com/samples/highres-histogram-3D-chart.png), it gives nothing more than a normal histogram, and only brings risks of misleading (when there is two values nearly the same, in such a plot it is harder (compared to a normal histogram) to decide which one is larger).

Hi again,

ReplyDeleteregarding this example:

http://www.photobiology.com/v1/maragoni/img13.jpg

I did not intend to use 'with boxes' options but linepoints instead. Actually what I have is a list of values for a complex variable, so two columns of real and imaginary values. And I wanted to observe the shape of the distribution function. Furthermore, if I use the kdensity option, perhaps I could get a nice smooth distribution.

I do agree however that the second type of 3D histogram is pretty useless and has only aesthetic merit.

Ray2.0:

ReplyDeletePlotting a list of complex variable is actually not a 3-d plotting problem. It is two 2-d histogram plotting tasks. So ...

Worked beautifully. Thanks a lot.

ReplyDeleteI love your hanlde too because I speak Chinese.

Hi!

ReplyDeletethank you very much!!!! Let me ask one question: how did you generate random numbers between [-4,4]. I'm supposed not to use a library function, but one generator provided. I can normalize it between [0, n], but how to proceed to achive [-n,n].

Thank you so much again!

Provided now you can generate a random number x uniformly distributed in [0,1], then max*(2*x-1) will be a random number uniformly distributed in range [-max,max].

DeleteHi!

ReplyDeleteCan i use 2 data files and build a stacked histogram with different colors. I have two data files data1.dat and data2.dat. I can make a histogram using ur code with data1.dat. Now on the same plot i want to make the histogram with data2.dat but stacked on top of the first histogram. How can i do it?

Thanks

pc

It is always very difficult to process two files at the same time when you plot using gnuplot. It is advised to merge the files together previously. If you use Linux platform command "paste" can be used to merge files.

DeleteHi,

ReplyDeleteI need plot something of this sort http://www.flickr.com/photos/intumyspace/6911907271/

and need to use gnuplot.py can u suggest how can we vary the histogram width and need to display some info in every slot.

Currently I just found this, and trying to figure out how to dynamically plot histograms one after the other rather than plotting at once when whole info is available

http://gnuplot.sourceforge.net/demo/histograms.html

Thanks for your time

To vary the histogram width, the "boxes" plot style is recommended to use. You may refer to this post: http://gnuplot-surprising.blogspot.com/2011/09/plot-histograms-using-boxes.html

Deletethanks! this example script has proved incredibly useful

ReplyDeleteThanks for your article! Very useful

ReplyDeleteGood Article About Statistic analysis and histogram plotting using gnuplot

ReplyDeleteWhat is "(1.0)" mean in the last line? Can I replace it with a column number?

ReplyDelete"(1.0)" means value 1.0 . It can not be replaced with a column number.

DeleteAnother question, why it is wrong when I use "set logscale xy"?

ReplyDeleteAre you sure, it is an error caused by "set logscale xy"?

DeleteThank you for your reply!

DeleteWhen I use "set logscale y" the histogram plot become flat. I tried another way to plot. First output the number of each column, then plot histogram. This works all right when use logscale.

Thank you very much indeed! It was very useful for me! ;)

ReplyDeleteReally great, Yesterday I wasted 15 minutes in doing the same with Libre calc. Thanks for the code.. Its awesome!!!

ReplyDeleteHi over there. Thanks for your blog. Very useful. However, I slightly modified it for controlling explicitly the number of intervals, etc. For my data set, for the same data limits, when I ask to plot 10 intervals (of 5 units), the subroutine works fine even when I sent to plot relative frequencies. However, when I send to print 5 intervals (of 10 units) I get rather 6 boxes!! do you happen to know why?.

ReplyDeleteIf you can give me your plotting-script and data file, I may figure out the problem.

Deletethanks a lot!

ReplyDeleteAwesomeness!

ReplyDeleteWow, it worked in a minute, thanks. Great example.

ReplyDeleteThank you :)

ReplyDeleteHi,

ReplyDeleteThanks for very useful blog!

could you explain a bit how I can use set table command. I want to fit a density plot to my histogram.

Thanks a lot!

when one use command

Deleteset table "outfile-name",

then plot and splot command will not actually plot a figure, in stead it will print out a data file with the name you specified.

Hi.

ReplyDeleteThanks a lot!

I use gnuplot 4.4 patchlevel 0(=V1) and gnuplot 4.2 patchlevel 2(=V2)

When i use your script in V2 - all work pretty.

In V1 - i get error:

"all points y value undefined!"

if i set yrange to [0:100] it's work, but plot is empty - only axes

Please help me to solve this problem

Thank you.

It is a strange problem. The script worked well on my computer even when the gnuplot 4.4.0 is used. Maybe you can restart your gnuplot and then run the script again.

DeleteVery useful, cheers!

ReplyDelete姐姐好厉害。。

ReplyDeleteGreat post!

ReplyDeleteCould you please a little on the functions used here? Also, how to plot the relative frequencies without using any other pre-processing tools?

After the first line add a new line "stats 'data.dat' u 1". And modify the last line to "plot "data.dat" u (hist($1,width)):(1.0)/STATS_records smooth freq w boxes lc rgb"green" notitle". Then the relative frequencies is plotted.

DeleteMany thanks for this quick tutorial !!

ReplyDeleteThis is very useful, but i have now an other problem, i want do make a normal distribution with this datas, how i can do this?

ReplyDeleteJust great!. Thanks so much!

ReplyDeleteHi all,

ReplyDeleteI have used this example and then got this error:

delay.sh: line 7: syntax error near unexpected token `x,width'

./delay.sh: line 7: `hist(x,width)=width*floor(x/width)+width/2.0'

Any one has faced the same problem or knows to solve it please.

Hi, thanks for this script. Although it gave me a syntax error, associated with the line 'set offset graph 0.05,0.05,0.05,0.0', I was able to run it successfully after commenting on this line.

ReplyDeleteVery useful script - thank you :-) Any ideas how I would set the y upper bound to be dynamic? (i.e. the max value of the largest bin frequency)

ReplyDeleteThanks!

It should already be set to be dynamic, and you can always try to leave the yrange line out and see if the result looks good.

DeleteThanks, still very useful!

ReplyDeleteI also encountered the following error.

"all points y value undefined"

This occurred because I used "min=5" instead of "min=5."

That piece of code was extremely helpful.

ReplyDeleteThank you!

Shouldn't there be:

ReplyDeletehist(x,width)=width*floor((x-min)/width)+width/2.0+min

instead of:

hist(x,width)=width*floor(x/width)+width/2.0 ?

For case:

x=10; min=1; max=101; n=10 (width=10)

x should map into interval [1:11]

When I ran the script it gave the following error :

ReplyDeleteplot "data.dat" u (hist(,width)):(1.0) smooth freq w boxes lc rgb"green" notitle

^

line 0: invalid expression

Can anyone help me with this error ?