## Sunday, September 11, 2011

### Statistic analysis and histogram plotting using gnuplot

Given a data file containing a set of data, count how many datas locate in intervals [a1:a2],[a2:a3]... respectively, then plot the result into a histogram. This a common problem in statistics and exactly what we will do in this article.

Firstly, let us see how to map the data into intervals. There is a function "floor(x)" which return the largest integer not greater than its argument. So function floor(x/dx)*dx will map x into one of the intervals [-n*dx:-(n-1)*dx],[-(n-1)*dx:-(n-2)*dx]...[(n-1)*dx:n*dx].

Now we come to count the data number in each interval. In gnuplot there is a smooth option called "frequency". It makes the data monotonic in x. Points with the same x-value are replaced by a single point having the summed y-values. Using this property, we can count the data numbers in the intervals.

At last we plot our result using boxes plot style.

The main idea have introduced. It is time to write the plotting script.
```reset
n=100 #number of intervals
max=3. #max value
min=-3. #min value
width=(max-min)/n #interval width
#function used to map a value to the intervals
hist(x,width)=width*floor(x/width)+width/2.0
set term png #output terminal and file
set output "histogram.png"
set xrange [min:max]
set yrange [0:]
#to put an empty boundary around the
#data inside an autoscaled graph.
set offset graph 0.05,0.05,0.05,0.0
set xtics min,(max-min)/5,max
set boxwidth width*0.9
set style fill solid 0.5 #fillstyle
set tics out nomirror
set xlabel "x"
set ylabel "Frequency"
#count and plot
plot "data.dat" u (hist(\$1,width)):(1.0) smooth freq w boxes lc rgb"green" notitle
```
We use a data file (download from here) which contains 10000 normally distributed random numbers and get a graph like the follow one.

 statistic histogram plotting using gnuplot

1. you are my hero!

2. hi, i tried the same thing using gnuplot but it says "undefined variable: graph"

then i still continue with the plot and says "all points y value undefined"

thanks.

1. "all points y value indefined" means that all your y points are out of your yrange, you have to set it in order to have them in it...

2. I just had the same mistake. I forgot to set the datafile delimieter ��

3. Hi,Callisto:
1.The script runs well on my computer, I have just confirmed about it. So the first question may be caused by your mistyping.
2."all points y value undefined" may been caused by the gnuplot can not find the data file. So have you put the file data.dat under the working directory?

You may copy the script to a file (for example, plot.gplt), and then copy it and the data file (data.dat) to your working directory. After these are done, run command "load 'plot.gplt'" using gnuplot.

4. i managed to figure it out, just had to remove the word "graph". :)

Would you be able to tell me how to fit a gaussian curve onto the histogram? thank you.

5. Hi,Callisto:
It is a bit hard to fit a Gaussian curve in this problem only using gnuplot, since gnuplot is designed as a plot tool, not a data processing software. Tricks played, the goal may be achieved. May be I will talk about how to do it in a future post.
Now I advice using data processing software to process the data at first. Getting the fitted curve and then plot it on the graph.

1. I'm surprised that you can create so many beautiful plots with Gnuplot using a lot of features, but you do not know the "fit" command.
I see that this comment is quite old and most probably (if you looked after) you found already that fitting in Gnuplot is actually very simple.
It is worth a try.

6. Really cool thing! I never thought that gnuplot could do something like that and it's exactly what I wanted to do. Just a little question is it possible to fit a function (in this case a gaussian) to this histogram?
In any case thanks a lot!

7. Anonymous:
It is possible to use "set table " to export the data to a data file. And then use "fit" command to fit a curve.

8. Thank you so much for your fast answer! I was trying since two hours... Now finally I have a really beautiful graph :) I love gnuplot and your blog!
Greetings from Lyon, yours Daniel

9. Brilliant!

10. Hi there,

First of all, thank you for this blog! I'm trying to make a histogram using the same script that you provided above. the only difference is that the data doesnt seem to be accumulating. Although I have one set of data, it seems to plot 4 different histograms.

Here is what it looks life

Is there any reason why this is so ?

The only difference in my script is that I have normalised the distribution by changing

u (hist(\$1,width)):(1.0)

to

u (hist(\$1,width)):(1.0/(N*width))

where N = number of data points

Help?

R

11. Ray2.0:
The most possible reason is that there are some blank lines in your data file. Examine your data file and delete the lines, and then have a try again. Hope your success！

12. Hi there!

Thanks so much for the reply! You are right. My data file is also 500 000 lines and there were some nans in there. I have another point of query however! Do you know how to plot 3d histograms? I saw an image of this online: a 3d histogram with projections on the different sides of the plot.

I hope this makes sense!

R

13. Ray2.0:
A 3-d histogram is always not necessary and not suggested.
For example, this graph (http://www.photobiology.com/v1/maragoni/img13.jpg) is indeed really a bad one, since the bars shade each other, so that the reader can not get the information the graph is intended to give. And this kind of graph is always suggested to plot as a heatmap (http://flowingdata.com/wp-content/uploads/yapb_cache/nba_heatmap_revised.7sjutbstqyw40kw4o08og084k.2xne1totli0w8s8k0o44cs0wc.th.png).
And for a 3-d histogram like this one (http://cqisignals.com/samples/highres-histogram-3D-chart.png), it gives nothing more than a normal histogram, and only brings risks of misleading (when there is two values nearly the same, in such a plot it is harder (compared to a normal histogram) to decide which one is larger).

14. Hi again,

regarding this example:
http://www.photobiology.com/v1/maragoni/img13.jpg

I did not intend to use 'with boxes' options but linepoints instead. Actually what I have is a list of values for a complex variable, so two columns of real and imaginary values. And I wanted to observe the shape of the distribution function. Furthermore, if I use the kdensity option, perhaps I could get a nice smooth distribution.

I do agree however that the second type of 3D histogram is pretty useless and has only aesthetic merit.

15. Ray2.0:
Plotting a list of complex variable is actually not a 3-d plotting problem. It is two 2-d histogram plotting tasks. So ...

16. Worked beautifully. Thanks a lot.
I love your hanlde too because I speak Chinese.

17. Hi!

thank you very much!!!! Let me ask one question: how did you generate random numbers between [-4,4]. I'm supposed not to use a library function, but one generator provided. I can normalize it between [0, n], but how to proceed to achive [-n,n].

Thank you so much again!

1. Provided now you can generate a random number x uniformly distributed in [0,1], then max*(2*x-1) will be a random number uniformly distributed in range [-max,max].

18. Hi!

Can i use 2 data files and build a stacked histogram with different colors. I have two data files data1.dat and data2.dat. I can make a histogram using ur code with data1.dat. Now on the same plot i want to make the histogram with data2.dat but stacked on top of the first histogram. How can i do it?

Thanks
pc

1. It is always very difficult to process two files at the same time when you plot using gnuplot. It is advised to merge the files together previously. If you use Linux platform command "paste" can be used to merge files.

19. Hi,
I need plot something of this sort http://www.flickr.com/photos/intumyspace/6911907271/
and need to use gnuplot.py can u suggest how can we vary the histogram width and need to display some info in every slot.
Currently I just found this, and trying to figure out how to dynamically plot histograms one after the other rather than plotting at once when whole info is available
http://gnuplot.sourceforge.net/demo/histograms.html

1. To vary the histogram width, the "boxes" plot style is recommended to use. You may refer to this post: http://gnuplot-surprising.blogspot.com/2011/09/plot-histograms-using-boxes.html

20. thanks! this example script has proved incredibly useful

21. Thanks for your article! Very useful

22. Good Article About Statistic analysis and histogram plotting using gnuplot

23. What is "(1.0)" mean in the last line? Can I replace it with a column number?

1. "(1.0)" means value 1.0 . It can not be replaced with a column number.

24. Another question, why it is wrong when I use "set logscale xy"?

1. Are you sure, it is an error caused by "set logscale xy"?

When I use "set logscale y" the histogram plot become flat. I tried another way to plot. First output the number of each column, then plot histogram. This works all right when use logscale.

25. Thank you very much indeed! It was very useful for me! ;)

26. Really great, Yesterday I wasted 15 minutes in doing the same with Libre calc. Thanks for the code.. Its awesome!!!

27. Hi over there. Thanks for your blog. Very useful. However, I slightly modified it for controlling explicitly the number of intervals, etc. For my data set, for the same data limits, when I ask to plot 10 intervals (of 5 units), the subroutine works fine even when I sent to plot relative frequencies. However, when I send to print 5 intervals (of 10 units) I get rather 6 boxes!! do you happen to know why?.

1. If you can give me your plotting-script and data file, I may figure out the problem.

28. thanks a lot!

29. Awesomeness!

30. Wow, it worked in a minute, thanks. Great example.

31. Thank you :)

32. Hi,
Thanks for very useful blog!
could you explain a bit how I can use set table command. I want to fit a density plot to my histogram.
Thanks a lot!

1. when one use command
set table "outfile-name",
then plot and splot command will not actually plot a figure, in stead it will print out a data file with the name you specified.

33. Hi.

Thanks a lot!

I use gnuplot 4.4 patchlevel 0(=V1) and gnuplot 4.2 patchlevel 2(=V2)

When i use your script in V2 - all work pretty.
In V1 - i get error:
"all points y value undefined!"
if i set yrange to [0:100] it's work, but plot is empty - only axes

Thank you.

1. It is a strange problem. The script worked well on my computer even when the gnuplot 4.4.0 is used. Maybe you can restart your gnuplot and then run the script again.

34. Very useful, cheers!

35. 姐姐好厉害。。

36. Great post!

Could you please a little on the functions used here? Also, how to plot the relative frequencies without using any other pre-processing tools?

1. After the first line add a new line "stats 'data.dat' u 1". And modify the last line to "plot "data.dat" u (hist(\$1,width)):(1.0)/STATS_records smooth freq w boxes lc rgb"green" notitle". Then the relative frequencies is plotted.

37. Many thanks for this quick tutorial !!

38. This is very useful, but i have now an other problem, i want do make a normal distribution with this datas, how i can do this?

39. Just great!. Thanks so much!

40. Hi all,
I have used this example and then got this error:
delay.sh: line 7: syntax error near unexpected token `x,width'
./delay.sh: line 7: `hist(x,width)=width*floor(x/width)+width/2.0'

Any one has faced the same problem or knows to solve it please.

41. Hi, thanks for this script. Although it gave me a syntax error, associated with the line 'set offset graph 0.05,0.05,0.05,0.0', I was able to run it successfully after commenting on this line.

42. Very useful script - thank you :-) Any ideas how I would set the y upper bound to be dynamic? (i.e. the max value of the largest bin frequency)

Thanks!

1. It should already be set to be dynamic, and you can always try to leave the yrange line out and see if the result looks good.

43. Thanks, still very useful!

I also encountered the following error.

"all points y value undefined"

This occurred because I used "min=5" instead of "min=5."

44. That piece of code was extremely helpful.
Thank you!

45. Shouldn't there be:

hist(x,width)=width*floor((x-min)/width)+width/2.0+min

hist(x,width)=width*floor(x/width)+width/2.0 ?

For case:
x=10; min=1; max=101; n=10 (width=10)

x should map into interval [1:11]

46. When I ran the script it gave the following error :

plot "data.dat" u (hist(,width)):(1.0) smooth freq w boxes lc rgb"green" notitle
^
line 0: invalid expression

Can anyone help me with this error ?

47. I believe there are many more pleasurable opportunities ahead for individuals that looked at your site.

Data Science Training in Bangalore

48. THANK YOU!!!!!!!!!!!! U SAVED ME *w*

49. IOT Training in Bangalore - Live Online & Classroom
Iot Training course observes iot as the platform for networking of different devices on the internet and their inter related communication. Iot Training in Bangalore

python certification

51. Selenium is one of the most popular automated testing tool used to automate various types of applications. Selenium is a package of several testing tools designed in a way for to support and encourage automation testing of functional aspects of web-based applications and a wide range of browsers and platforms and for the same reason, it is referred to as a Suite.

Javascript Interview Questions
Human Resource (HR) Interview Questions

52. JavaScript is the most widely deployed language in the world
Javascript Interview Questions

53. You might comment on the order system of the blog. You should chat it's splendid. Your blog audit would swell up your visitors. I was very pleased to find this site.I wanted to thank you for this great read!!data science course in dubai

54. Pretty good post. Really enjoyed reading your blog post. great information about use of gnuplot, nice post thank you

ExcelR Data Science Course in Bangalore

55. Its as if you had a great grasp on the subject matter, but you forgot to include your readers. Perhaps you should think about this from more than one angle.data science course in dubai

56. while giving 0.0 in data.dat file it is giving error, invalid command

57. Pretty good post. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your blog posts. Any way I’ll be subscribing to your feed and I hope you post again soon.

Big Data Course

58. This comment has been removed by the author.

59. Gangaur Realtech is a professionally managed organisation specializing in real estate services where integrated services are provided by professionals to its clients seeking increased value by owning, occupying or investing in real estate.
date analytics certification training courses
data science courses training
data analytics certification courses in Bangalore

60. I don’t think many of websites provide this type of information.
Data Science Course in Pune

61. I like viewing web sites which comprehend the price of delivering the excellent useful resource free of charge. I truly adored reading your posting. Thank you!
what are solar panel and how to select best one
top 7 best washing machine
iphone XR vs XS max

62. Awesome and interesting article. Great things you've always shared with us. Thanks. Just continue composing this kind of post. google transcription service

63. Hi buddies, it is great written piece entirely defined, continue the good work constantly.
Data Science Course in Pune

64. I really enjoy simply reading all of your weblogs. Simply wanted to inform you that you have people like me who appreciate your work. Definitely a great post. Hats off to you! The information that you have provided is very helpful.
www.technewworld.in
How to Start A blog 2019

Thanks for sharing the useful information

66. Really nice and interesting post. I was looking for this kind of information and enjoyed reading this one. Keep posting. Thanks for sharing.
Data Science Courses

67. Great article

68. DJ gigs London, DJ agency UK
Dj Required has been setup by a mixed group of London’s finest Dj’s, a top photographer and cameraman. Together we take on Dj’s, Photographers and Cameramen with skills and the ability required to entertain and provide the best quality service and end product. We supply Bars, Clubs and Pubs with Dj’s, Photographers, and Cameramen. We also supply for private hire and other Occasions. Our Dj’s, Photographers and Cameramen of your choice, we have handpicked the people we work with

69. After reading your article I was amazed. I know that you explain it very well. And I hope that other readers will also experience how I feel after reading your article.
data analytics course malaysia

70. PhenQ_Reviews 2019 – WHAT IS PhenQ ?

How_to_use_PhenQ ?This is a powerful slimming formula made by combining the multiple weight loss
benefits of variousPhenQ_ingredients. All these are conveniently contained in
one pill. It helps you get the kind of body that you need. The ingredients of
the pill are from natural sources so you don’t have to worry much about the side
effects that come with other types of dieting pills.Is_PhenQ_safe ? yes this is completly safe.
Where_to_buy_PhenQ ? you can order online.PhenQ Scam ? this medicine is not scam at all.

Watch this PhenQ_Reviews to know more.
Know about PhenQ Scam from here.
know Is_PhenQ_safe for health.
you don`t know How_to_use_PhenQ check this site

wanna buy phenq check this site and know Where_to_buy_PhenQ and how to use.

check the PhenQ_ingredients to know more.

what is PhenQ check this site.

71. data science course singapore is the best data science course

72. Car Maintenance Tips That You Must Follow

For everyone who owns it, Car Maintenance Tips need to know.
Where the vehicle is currently needed by everyone in the world to
facilitate work or to be stylish.
You certainly want the vehicle you have always been in maximum
performance. It would be very annoying if your vehicle isn’t even
comfortable when driving.
Therefore to avoid this you need to know Vehicle Maintenance Tips or Car Tips
Buy New Car visit this site to know more.

wanna Buy New Car visit this site.
you dont know about Car Maintenance see in this site.
know more about Hot car news in here.

73. If you are serious about a career pertaining to Data science, then you are at the right place. ExcelR is a leader in the space of online Data Science training across the globe.
ExcelR has trained 6,000+ professionals on Data Science Courseacross the globe. Our expert trainers will help you with upskilling the concepts with assignments and live
projects. ExcelR is the training delivery partner in the space of Data Science for 5 universities and 40+ premier educational institutions across the globe. Faculty is our
strength. All our trainers are working as Data Scientists with over 15+ years of professional experience. ExcelR offers a blended learning model where participants can avail
themselves instructor-led online Data Science sessions and e-learning (recorded sessions) with a single enrollment. A combination of these two modes of learning will produce
a synergistic impact on learning. One can attend an unlimited number of instructor-led online sessions from different trainers for 1 year with the all new and exclusive JUMBO
PASS. No wonder ExcelR is regarded as the best training institute to learn Data Science by our participants. Data Science jobs are of the highest demand in the job market
across the globe.

Data Science Course

74. I really enjoyed your blog Thanks for sharing such an informative post.
https://myseokhazana.com/

Indian Bookmarking list
Indian Bookmarking list
India Classified Submission List
Indian Classified List

75. Sehar News is a wide area that envelops pakistan news , kashmir news , International News, Sports News, Arts and
Entertainment News, Science and Technology, Business News, latest news in urdu , Education News and today news Columns.
The perusers can snatch most recent urdu news dependent on different political and get-together
occurring in the nation. Sehar News covers the most recent and up and coming news features, Read today urdu news and top stories from different backgrounds and carries it to the viewers

wanna know latest pakistan news ? click pakistan news and know more.

Read latest news in urdu and know more .

read all the latest urdu news in this site.

know the current news of kashmir news check here.

76. LogoSkill, Custom Logo Design Services is specifically a place where plain ideas converted into astonishing and amazing designs. You buy a logo design, we feel proud in envisioning
our client’s vision to represent their business in the logo design, and this makes us unique among all. Based in USA we are the best logo design, website design and stationary
design company along with the flayer for digital marketing expertise in social media, PPC, design consultancy for SMEs, Start-ups, and for individuals like youtubers, bloggers
and influencers. We are the logo design company, developers, marketers and business consultants having enrich years of experience in their fields. With our award winning
customer support we assure that, you are in the hands of expert designers and developers who carry the soul of an artist who deliver only the best.

Custom Logo Design Services

77. This is also a very good post which I really enjoyed reading. It is not every day that I have the possibility to see something like this..
big data course

78. "This is the best website for Unique clipping path and high quality image editing service Company in Qatar. Unique clipping path
"