Showing posts with label histogram. Show all posts
Showing posts with label histogram. Show all posts

Sunday, October 30, 2011

Add value labels to the top of bars in a bar chart

It is easy to plot a bar chart with gnuplot using plot style boxes or histogram. This time we talk about how to add values labels to the top of bars in a bar chart.

The first coming idea is adding labels manually using command "set label ...". This a method but a poor efficient one. A better one is plot the labels with plot style "labels". And this is the method we will talk about here. A sample script is shown below.
reset
set term png font "Times,18"    #set terminal and output file
set output "bar_labels.png"
set xlabel "x value"    #set x and y label
set ylabel "Frequency"
set xrange [-0.5:4.5]    #set x and y range
set yrange [0:4]
set xtics 0,1,4    #set xtics
set style fill solid    #set plot style
set boxwidth 0.5
unset key    #legend not plotted
plot "bar_data.dat" using 1:2 with boxes,\
     "bar_data.dat" using 1:2:2 with labels
#plot bar chart and the value labels on the bars
In this script, we have used a data file (bar_data.dat) like this one
0    2
1    3
2    3
3    2
4    2

Run the script, we get picture file bar_labels.png like the following one.

Gnuplot bar chart with value labels

If you do not like the position of the labels in the upper picture, you can just change it. For example,
lot "bar_data.dat" using 1:2 with boxes,\
     "bar_data.dat" using 1:($2+0.25):2 with labels
will put the labels 0.25 units higher.

Value labels with a 0.25 units higher position

Saturday, September 17, 2011

Plot histograms using boxes

Some one may ask:"There is histogram plot style in gnuplot, why plot it with boxes?" I would like to say there is some restriction on the built in histogram plot style, for example the x-axis is always using the row number, you can not make it using the coloumns in the data file.

The simplest case is that there is only one group of data to be plotted. In this case you just set the boxwidth to a proper value, for example 0.95, and plot it with boxes. Here is an example.
The data file is like this:
1975    0.5     9.0
1980    2.0     12.0
1985    2.5     10.1
1990    2.6     9.1
1995    2.0     7.2
2000    5.0     8.0
2005    10.2    6.0
2010    15.1    6.2
The plotting script is like this:
reset
set term png truecolor
set output "profit.png"
set xlabel "Year"
set ylabel "Profit(Million Dollars)"
set grid
set boxwidth 0.95 relative
set style fill transparent solid 0.5 noborder
plot "profit.dat" u 1:2 w boxes lc rgb"green" notitle
This example plot a graph like this one:

Plot histogram using boxes with one group of data

When there is more than one group of data to plot, the boxwidth and gap between the boxes should be calculated carefully. We do it like this:
reset
dx=5.
n=2
total_box_width_relative=0.75
gap_width_relative=0.1
d_width=(gap_width_relative+total_box_width_relative)*dx/2.
reset
set term png truecolor
set output "profit.png"
set xlabel "Year"
set ylabel "Profit(Million Dollars)"
set grid
set boxwidth total_box_width_relative/n relative
set style fill transparent solid 0.5 noborder
plot "profit.dat" u 1:2 w boxes lc rgb"green" notitle,\
     "profit.dat" u ($1+d_width):3 w boxes lc rgb"red" notitle
This time we get a histogram with two group of data like this:

Plot histogram using boxes with more than one group of data

Sunday, September 11, 2011

Statistic analysis and histogram plotting using gnuplot

Given a data file containing a set of data, count how many datas locate in intervals [a1:a2],[a2:a3]... respectively, then plot the result into a histogram. This a common problem in statistics and exactly what we will do in this article.

Firstly, let us see how to map the data into intervals. There is a function "floor(x)" which return the largest integer not greater than its argument. So function floor(x/dx)*dx will map x into one of the intervals [-n*dx:-(n-1)*dx],[-(n-1)*dx:-(n-2)*dx]...[(n-1)*dx:n*dx].

Now we come to count the data number in each interval. In gnuplot there is a smooth option called "frequency". It makes the data monotonic in x. Points with the same x-value are replaced by a single point having the summed y-values. Using this property, we can count the data numbers in the intervals.

At last we plot our result using boxes plot style.

The main idea have introduced. It is time to write the plotting script.
reset
n=100 #number of intervals
max=3. #max value
min=-3. #min value
width=(max-min)/n #interval width
#function used to map a value to the intervals
hist(x,width)=width*floor(x/width)+width/2.0
set term png #output terminal and file
set output "histogram.png"
set xrange [min:max]
set yrange [0:]
#to put an empty boundary around the
#data inside an autoscaled graph.
set offset graph 0.05,0.05,0.05,0.0
set xtics min,(max-min)/5,max
set boxwidth width*0.9
set style fill solid 0.5 #fillstyle
set tics out nomirror
set xlabel "x"
set ylabel "Frequency"
#count and plot
plot "data.dat" u (hist($1,width)):(1.0) smooth freq w boxes lc rgb"green" notitle
We use a data file (download from here) which contains 10000 normally distributed random numbers and get a graph like the follow one.

statistic histogram plotting using gnuplot
Creative Commons License
Except as otherwise noted, the content of this page is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.