Sunday, September 11, 2011

Statistic analysis and histogram plotting using gnuplot

Given a data file containing a set of data, count how many datas locate in intervals [a1:a2],[a2:a3]... respectively, then plot the result into a histogram. This a common problem in statistics and exactly what we will do in this article.

Firstly, let us see how to map the data into intervals. There is a function "floor(x)" which return the largest integer not greater than its argument. So function floor(x/dx)*dx will map x into one of the intervals [-n*dx:-(n-1)*dx],[-(n-1)*dx:-(n-2)*dx]...[(n-1)*dx:n*dx].

Now we come to count the data number in each interval. In gnuplot there is a smooth option called "frequency". It makes the data monotonic in x. Points with the same x-value are replaced by a single point having the summed y-values. Using this property, we can count the data numbers in the intervals.

At last we plot our result using boxes plot style.

The main idea have introduced. It is time to write the plotting script.
n=100 #number of intervals
max=3. #max value
min=-3. #min value
width=(max-min)/n #interval width
#function used to map a value to the intervals
set term png #output terminal and file
set output "histogram.png"
set xrange [min:max]
set yrange [0:]
#to put an empty boundary around the
#data inside an autoscaled graph.
set offset graph 0.05,0.05,0.05,0.0
set xtics min,(max-min)/5,max
set boxwidth width*0.9
set style fill solid 0.5 #fillstyle
set tics out nomirror
set xlabel "x"
set ylabel "Frequency"
#count and plot
plot "data.dat" u (hist($1,width)):(1.0) smooth freq w boxes lc rgb"green" notitle
We use a data file (download from here) which contains 10000 normally distributed random numbers and get a graph like the follow one.

statistic histogram plotting using gnuplot


  1. hi, i tried the same thing using gnuplot but it says "undefined variable: graph"

    then i still continue with the plot and says "all points y value undefined"


    1. "all points y value indefined" means that all your y points are out of your yrange, you have to set it in order to have them in it...

    2. I just had the same mistake. I forgot to set the datafile delimieter ��

  2. Hi,Callisto:
    1.The script runs well on my computer, I have just confirmed about it. So the first question may be caused by your mistyping.
    2."all points y value undefined" may been caused by the gnuplot can not find the data file. So have you put the file data.dat under the working directory?

    You may copy the script to a file (for example, plot.gplt), and then copy it and the data file (data.dat) to your working directory. After these are done, run command "load 'plot.gplt'" using gnuplot.

  3. i managed to figure it out, just had to remove the word "graph". :)

    Would you be able to tell me how to fit a gaussian curve onto the histogram? thank you.

  4. Hi,Callisto:
    It is a bit hard to fit a Gaussian curve in this problem only using gnuplot, since gnuplot is designed as a plot tool, not a data processing software. Tricks played, the goal may be achieved. May be I will talk about how to do it in a future post.
    Now I advice using data processing software to process the data at first. Getting the fitted curve and then plot it on the graph.

    1. I'm surprised that you can create so many beautiful plots with Gnuplot using a lot of features, but you do not know the "fit" command.
      I see that this comment is quite old and most probably (if you looked after) you found already that fitting in Gnuplot is actually very simple.
      It is worth a try.

  5. Really cool thing! I never thought that gnuplot could do something like that and it's exactly what I wanted to do. Just a little question is it possible to fit a function (in this case a gaussian) to this histogram?
    In any case thanks a lot!

  6. Anonymous:
    It is possible to use "set table " to export the data to a data file. And then use "fit" command to fit a curve.

  7. Thank you so much for your fast answer! I was trying since two hours... Now finally I have a really beautiful graph :) I love gnuplot and your blog!
    Greetings from Lyon, yours Daniel

  8. Hi there,

    First of all, thank you for this blog! I'm trying to make a histogram using the same script that you provided above. the only difference is that the data doesnt seem to be accumulating. Although I have one set of data, it seems to plot 4 different histograms.

    Here is what it looks life

    Is there any reason why this is so ?

    The only difference in my script is that I have normalised the distribution by changing

    u (hist($1,width)):(1.0)


    u (hist($1,width)):(1.0/(N*width))

    where N = number of data points


    Thanks in advance!


  9. Ray2.0:
    The most possible reason is that there are some blank lines in your data file. Examine your data file and delete the lines, and then have a try again. Hope your success!

  10. Hi there!

    Thanks so much for the reply! You are right. My data file is also 500 000 lines and there were some nans in there. I have another point of query however! Do you know how to plot 3d histograms? I saw an image of this online: a 3d histogram with projections on the different sides of the plot.

    I hope this makes sense!

    thanks in advance =)


  11. Ray2.0:
    A 3-d histogram is always not necessary and not suggested.
    For example, this graph ( is indeed really a bad one, since the bars shade each other, so that the reader can not get the information the graph is intended to give. And this kind of graph is always suggested to plot as a heatmap (
    And for a 3-d histogram like this one (, it gives nothing more than a normal histogram, and only brings risks of misleading (when there is two values nearly the same, in such a plot it is harder (compared to a normal histogram) to decide which one is larger).

  12. Hi again,

    regarding this example:

    I did not intend to use 'with boxes' options but linepoints instead. Actually what I have is a list of values for a complex variable, so two columns of real and imaginary values. And I wanted to observe the shape of the distribution function. Furthermore, if I use the kdensity option, perhaps I could get a nice smooth distribution.

    I do agree however that the second type of 3D histogram is pretty useless and has only aesthetic merit.

  13. Ray2.0:
    Plotting a list of complex variable is actually not a 3-d plotting problem. It is two 2-d histogram plotting tasks. So ...

  14. Worked beautifully. Thanks a lot.
    I love your hanlde too because I speak Chinese.

  15. Hi!

    thank you very much!!!! Let me ask one question: how did you generate random numbers between [-4,4]. I'm supposed not to use a library function, but one generator provided. I can normalize it between [0, n], but how to proceed to achive [-n,n].

    Thank you so much again!

    1. Provided now you can generate a random number x uniformly distributed in [0,1], then max*(2*x-1) will be a random number uniformly distributed in range [-max,max].

  16. Hi!

    Can i use 2 data files and build a stacked histogram with different colors. I have two data files data1.dat and data2.dat. I can make a histogram using ur code with data1.dat. Now on the same plot i want to make the histogram with data2.dat but stacked on top of the first histogram. How can i do it?


    1. It is always very difficult to process two files at the same time when you plot using gnuplot. It is advised to merge the files together previously. If you use Linux platform command "paste" can be used to merge files.

  17. Hi,
    I need plot something of this sort
    and need to use can u suggest how can we vary the histogram width and need to display some info in every slot.
    Currently I just found this, and trying to figure out how to dynamically plot histograms one after the other rather than plotting at once when whole info is available
    Thanks for your time

    1. To vary the histogram width, the "boxes" plot style is recommended to use. You may refer to this post:

  18. thanks! this example script has proved incredibly useful

  19. Thanks for your article! Very useful

  20. Good Article About Statistic analysis and histogram plotting using gnuplot

  21. What is "(1.0)" mean in the last line? Can I replace it with a column number?

    1. "(1.0)" means value 1.0 . It can not be replaced with a column number.

  22. Another question, why it is wrong when I use "set logscale xy"?

    1. Are you sure, it is an error caused by "set logscale xy"?

    2. Thank you for your reply!
      When I use "set logscale y" the histogram plot become flat. I tried another way to plot. First output the number of each column, then plot histogram. This works all right when use logscale.

  23. Thank you very much indeed! It was very useful for me! ;)

  24. Really great, Yesterday I wasted 15 minutes in doing the same with Libre calc. Thanks for the code.. Its awesome!!!

  25. Hi over there. Thanks for your blog. Very useful. However, I slightly modified it for controlling explicitly the number of intervals, etc. For my data set, for the same data limits, when I ask to plot 10 intervals (of 5 units), the subroutine works fine even when I sent to plot relative frequencies. However, when I send to print 5 intervals (of 10 units) I get rather 6 boxes!! do you happen to know why?.

    1. If you can give me your plotting-script and data file, I may figure out the problem.

  26. Wow, it worked in a minute, thanks. Great example.

  27. Hi,
    Thanks for very useful blog!
    could you explain a bit how I can use set table command. I want to fit a density plot to my histogram.
    Thanks a lot!

    1. when one use command
      set table "outfile-name",
      then plot and splot command will not actually plot a figure, in stead it will print out a data file with the name you specified.

  28. Hi.

    Thanks a lot!

    I use gnuplot 4.4 patchlevel 0(=V1) and gnuplot 4.2 patchlevel 2(=V2)

    When i use your script in V2 - all work pretty.
    In V1 - i get error:
    "all points y value undefined!"
    if i set yrange to [0:100] it's work, but plot is empty - only axes

    Please help me to solve this problem

    Thank you.

    1. It is a strange problem. The script worked well on my computer even when the gnuplot 4.4.0 is used. Maybe you can restart your gnuplot and then run the script again.

  29. Very useful, cheers!

  30. 姐姐好厉害。。

  31. Great post!

    Could you please a little on the functions used here? Also, how to plot the relative frequencies without using any other pre-processing tools?

    1. After the first line add a new line "stats 'data.dat' u 1". And modify the last line to "plot "data.dat" u (hist($1,width)):(1.0)/STATS_records smooth freq w boxes lc rgb"green" notitle". Then the relative frequencies is plotted.

  32. Many thanks for this quick tutorial !!

  33. This is very useful, but i have now an other problem, i want do make a normal distribution with this datas, how i can do this?

  34. Hi all,
    I have used this example and then got this error: line 7: syntax error near unexpected token `x,width'
    ./ line 7: `hist(x,width)=width*floor(x/width)+width/2.0'

    Any one has faced the same problem or knows to solve it please.

  35. Hi, thanks for this script. Although it gave me a syntax error, associated with the line 'set offset graph 0.05,0.05,0.05,0.0', I was able to run it successfully after commenting on this line.

  36. Very useful script - thank you :-) Any ideas how I would set the y upper bound to be dynamic? (i.e. the max value of the largest bin frequency)


    1. It should already be set to be dynamic, and you can always try to leave the yrange line out and see if the result looks good.

  37. Thanks, still very useful!

    I also encountered the following error.

    "all points y value undefined"

    This occurred because I used "min=5" instead of "min=5."

  38. That piece of code was extremely helpful.
    Thank you!

  39. Shouldn't there be:


    instead of:

    hist(x,width)=width*floor(x/width)+width/2.0 ?

    For case:
    x=10; min=1; max=101; n=10 (width=10)

    x should map into interval [1:11]

  40. When I ran the script it gave the following error :

    plot "data.dat" u (hist(,width)):(1.0) smooth freq w boxes lc rgb"green" notitle
    line 0: invalid expression

    Can anyone help me with this error ?

  41. THANK YOU!!!!!!!!!!!! U SAVED ME *w*

  42. Pretty good post. Really enjoyed reading your blog post. great information about use of gnuplot, nice post thank you

  44. while giving 0.0 in data.dat file it is giving error, invalid command

  47. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!
  52. This is also a very good post which I really enjoyed reading. It is not every day that I have the possibility to see something like this..
    big data course

    1. Unique clipping path and high quality image editing service Company in Qatar.We are offering Ecommerce product and all image editing service with reasonable price.See more Details visit here: Clipping Path

  56. Thank you so much for sharing the article. Really I get many valuable information from the article
  57. Hi, thanks for your script. It worked great.
    Is it possible to do the same plot with multiple columns? Say if my data has 3 columns and I want to plot all 3 columns in single plot. I tried using the below command but I get error "column number expected".

    plot for [i=1:3] "data.dat" u (hist($i, width)):(1.0) smooth freq w boxes lc rgb"green" notitle

    Replaced $i with column(i), still same error.

    Thank's in advance.

  60. Nice Graphics plot u have created..Thanku for sharing

  63. Thanks for sharing it.I got Very valuable information from your blog.your post is really very Informatve. I got Very valuable information from your blog.I’m satisfied with the information that you provide for me.

  64. Most automatic transmissions even have associate oil coolers, so check the hoses and contours for leaks. A shredded CV boot can cause associate unsuccessful CV joint and CV shaft. And if you are performing on a [*fr1] shaft, confirm to ascertain the hub bearing for any play during which the hub nut is torqued properly. A sloppy higher ball joint or lower ball joint can chomp your tires, or maybe produce your 2011 ford f250 front drive shaft 1/2 Ton - Pickup additional sturdy to manage. U-joints got to be cozy, and there mustn't be any signs of a leak around the differential cowl.. ford f250 front drive shaft .

    Thank you.
  66. Very nice blog here and thanks for post it.. Keep blogging...
  69. Heya i am for the primary time here.
    I found this board and I locate It simply beneficial & it
    click here for info more info.

  70. Very correct statistics furnished, Thanks a lot for sharing such beneficial data.

  74. I just got to this amazing site not long ago. I was actually captured with the piece of resources you have got here. Big thumbs up for making such wonderful blog page!
  75. I finally found great post here.I will get back here. I just added your blog to my bookmark sites. thanks.Quality posts is the crucial to invite the visitors to visit the web page, that's what this web page is providing.
  76. We as a team of real-time industrial experience with a lot of knowledge in developing applications in python programming (7+ years) will ensure that we will deliver our best inpython training in vijayawada. , and we believe that no one matches us in this context.

  77. Thanks for sharing it.I got Very valuable information from your blog.your post is really very Informative.I’m satisfied with the information that you provide for me.Nice post. By reading your blog, i get inspired and this provides some useful information.

  78. Such a very useful article. Very interesting to read this article.I would like to thank you for the efforts you had made for writing this awesome article.
  79. This website was... how do I say it? Relevant!! Finally I've found something that helped me. Appreciate it!
  82. Excellent Blog,Got much understanding about the topic after going through this blog page.
  86. Pretty article! I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing.

    MS Azure Training in Hyderabad
  89. This is so elegant and logical and clearly explained. Brilliantly goes through what could be a complex process and makes it obvious.

    workflow in sap abap

  90. This is a topic that's near to my heart... free Take care! Exactly where are your contact details though?

  91. Awesome blog, I enjoyed reading your articles. This is truly a great read for me. I have bookmarked it and I am looking forward to reading new articles. Keep up the
  92. Pretty article! I found some useful information in your blog, it was awesome to read, thanks for sharing this great content to my vision, keep sharing...

  93. Awesome blog, I enjoyed reading your articles. This is truly a great read for me.
  100. Very interesting blog. Many blogs I see these days do not really provide anything that attracts others, but believe me the way you interact is literally awesome.You can also check my articles as well.

  101. This Was An Amazing ! I Haven't Seen This Type of Blog Ever ! Thankyou For Sharing,
  102. Thank you for sharing such a wonderful blog!!! Really got appreciated with your works...

  103. Great blog!!! It is very impressive... thanks for sharing with us...keep posting.
  105. This is my first time visit here. From the tons of comments on your articles.I guess I am not only one having all the enjoyment right here! ExcelR Business Analytics Course

  106. I just recently discovered your blog and have now scrolled through the entire thing several times. I am very impressed and inspired by your skill and creativity, and your "style" is very much in line with mine. I hope you keep blogging and sharing your design idea
  109. You have provided finicky information for a new blogger so it has turned out to be really obliging. Keep up the good work!

  110. Fantastic blog! Thanks for sharing a very interesting post, I appreciate to blogger for an amazing post.

  114. What a piece of information !! Keep posting.
  116. This is a wonderful article, Given so much info in it, These type of articles keeps the users interest in the website, and keep on sharing more ... good luck.

  120. The way you write, you are really a professional blogger.

  121. This is such an awesome asset, to the point that you are giving and you give it away for nothing.our article has piqued a lot of positive interest.

  125. I just loved your article on the beginners guide to starting a blog.If somebody take this blog article seriously in their life,
    he/she can earn his living by doing blogging.thank you for thizs article.
  127. It is perfect time to make some plans for the future and it is time to be happy. I've read this post and if I could I desire to suggest you some interesting things or suggestions. Perhaps you could write next articles referring to this article. I want to read more things about it!
  130. Hi! This is my first visit to your blog! We are a team of volunteers and new initiatives in the same niche. Blog gave us useful information to work. You have done an amazing job!
  131. Hi! This is my first visit to your blog! We are a team of volunteers and new initiatives in the same niche. Blog gave us useful information to work. You have done an amazing job!
  132. ExcelR provides Business Analytics Courses. It is a great platform for those who want to learn and become a Business Analytics. Students are tutored by professionals who have a degree in a particular topic. It is a great opportunity to learn and grow.

  133. ExcelR provides Business Analytics Course. It is a great platform for those who want to learn and become a Business Analytics Courses. Students are tutored by professionals who have a degree in a particular topic. It is a great opportunity to learn and grow.

  134. ExcelR provides data analytics course. It is a great platform for those who want to learn and become a data analytics Courses. Students are tutored by professionals who have a degree in a particular topic. It is a great opportunity to learn and grow.

  135. This is such an awesome asset, to the point that you are giving and you give it away for nothing.our article has piqued a lot of positive interest.

  137. Thank you for sharing this valuable content.
  138. This knowledge.Excellently written article, if only all bloggers offered the same level of content as you, the internet would be a much better place. Please keep it up.
  139. Hi I tried the script but I got a
    line 7: syntax error near unexpected token `x,width'
    line 7: `hist(x,width)=width*floor(x/width)+width/2.0'
    Anyone could help?

  140. We are very thankful for share this informative post. We have an online store for Motogp Leather Suits & Jackets buy with worldwide free shipping.
  141. The writer is enthusiastic about purchasing wooden furniture on the web and his exploration about best wooden furniture has brought about the arrangement of this article.
