STATISTICS WHAT AND
WHY ?
The word 'statistics' conveys a variety of meanings to people. To some statistics is an imposing form of mathematics, whereas to other it suggests tables, charts and figures. Numbers play an essential role in statistics. They provide the raw material of statistics. These materials must be processed to be useful, just as crude oil must be refined into petrol before it can be used by an automobile engine. The study of statistics involves methods of refining numerical (and non-numerical) information into useful forms. These statements contain figures and as such they are called numerical statements of facts.
Whenever numbers are collected and compiled, regardless of what they represent, they become statistics. In other words, the term statistics is considered synonymous with ways and means of presenting and handling data, making inferences logically and drawing relevant conclusions.
In addition to meaning data, 'statistics' also refers to a subject. statistics is a body of methods of obtaining and analysing data in order to base decisions on them. It is a branch of scientific methods used in dealing with phenomena that can be described numerically either by counts or by measurements. Thus the word statistics refers either to quantitative information or to a method of dealing with quantitative information. In the first reference, it is used as a 'plural noun the statistics of births, deaths, imports, exports, etc.; in the second reference, the word is used as a 'singular Statistics deals with the collection, presentation, analysis and interpretation of the quantitative information.
The methods by which statistical data are analysed are called statistical methods. Statistical methods are used by governmental bodies, private business and research agencies as an indispensable aid in (1) forecasting, (2) controlling, and (3) exploring.
The science of statistics is said have originated from two main sources:
1. Government Records, and
2. Mathematics.
1. Government Records. This is the earliest foundation because all cultures with a recorded history had recorded statistics, and the recording, as far as is known, was done by agents of the government for governmental purposes.
2.Mathematics. Statistics is said to be a branch of applied mathematics. The present body of statistical methods, particularly those concerned with drawing inferences about population from a sample, is based on the mathematical theory of probability which marked a major step in the intellectual history of the world.
Statistics refers to the body of principles and procedures developed for the collection, classification, summarisation and interpretation of numerical data and for the use of such data.
ORIGIN:- Statistics is a very old branch of knowledge. The term statistics has its origin in Latin word Status, Italian word Statista or German term statistik. All the three terms mean "Political State".
WHAT IS STATISTICS:-
Broadly speaking, the term statistics has been generally used in two senses
(1) Plural sense. and
(2) Singular sense.
In the plural sense, the term statistics refers to numerical statements of facts relating to any field of enquiry such as data relating to production, income, expenditure, population, prices, etc. In other words, the term statistics in its plural sense refers to numerical data or data. In its singular sense, the term statistics refers to a science in which we deal with the techniques or methods for collecting, classifying, presenting, analysing and interpreting the data.
IN PLURAL SENSE:
"Statistics are numerical statements of facts in any department of enquiry placed in relation to each other." -Bowley
"Statistics are numerical descriptions of quantitative aspects of things and they take the form of counts or measurements." Wallis and Roberts
"By statistics we mean aggregate of facts affected to a marked extent by multiplicity of causes numerically expressed, enumerated or estimated according to reasonable standards of accuracy, collected in a systematic manner for a pre-determined purpose and placed in relation to each other"
IN SINGULAR SENSE:
"Statistics may be defined as the collection, presentation, analysis and interpretation of numerical data." -Croxton and Cowden
"Statistics is the science which deals with the methods of collecting, classifying, presenting, comparing and interpretating numerical data collected to throw some light on any sphere of enquiry" -Seligman
Features/ characteristics:-
(1) Aggregate of Facts: A single number does not constitute statistics. No conclusion can be drawn from it. It is only the aggregate of facts capable of offering some meaningful conclusion that constitute statistics.
(2) Numerically Expressed: Statistics are expressed in terms of numbers. Qualitative aspects like 'small' or 'big'; 'rich' or 'poor'; etc. are not statistics. For instance, the fact Kapil-Dev is tall and Gavaskar is short, has no statistical sense.
(3) Affected by Multiplicity of Causes: Statistics are not affected by any single factor. These influenced by many simultaneously. For instance, 30 per cent rise in prices may have been due to several causes, like reduction in supply, increase in demand, shortage of power, rise in wages, rise in taxes, etc.
(4) Reasonable Accuracy: A reasonable degree of accuracy must be kept in view while collecting statistical data. This accuracy depends on the purpose of investigation, its nature, size and available resources. For example, difference of one kg, of weight in five kg. of sweetmeat is a bright of inaccuracy but if against the weight of one quintal of wheat there is difference of one kg, of wheat, the inaccuracy will be treated as negligence and insignificant.
(5) Placed in Relation to each other: Such numericals alone will be called statistics as are mutually related and comparable. Unless they have the quality of comparison, they cannot be called statistics.
(6) Pre-determined Purpose: Statistics are collected with some pre-determined objective. Any information gathered without any definite purpose will only be a numerical value and not statistics If data pertaining to the farmers of a village are being collected, there must be some pre-determined objective.
(7) Enumerated or Estimated: Statistics may be collected by enumeration or these may be estimated. If the field of investigation is vast, the procedure of estimation may be helpful.
(8) Collected in Systematic Manner: Statistics should have been collected in a systematic manner. Before collecting them, a plan must be prepared. No conclusion can be drawn from statistics collected in haphazard manner.
1. **Quantitative Data Analysis**: Statistics allows for the exploration of complex relationships within data through mathematical operations, facilitating comparisons, trends identification, and pattern recognition. It enables the conversion of qualitative observations into measurable quantities for rigorous analysis.
2. **Data Collection**: The process of data collection in statistics involves careful planning, design, and execution to ensure the accuracy, reliability, and representativeness of the collected information. Various sampling techniques, such as random sampling, stratified sampling, and cluster sampling, are employed to obtain a diverse and unbiased sample.
3. **Data Presentation**: Statistics offers a wide array of visualisation techniques to effectively communicate findings and insights derived from data analysis. Visualisation tools like heatmaps, treemaps, and network diagrams help reveal hidden patterns, outliers, and relationships in the data, enhancing understanding and decision-making.
4. **Data Analysis**: Statistical analysis encompasses both exploratory and confirmatory techniques to uncover meaningful insights and relationships in the data. Exploratory data analysis (EDA) involves techniques like data profiling, summary statistics, and graphical visualisation to understand the structure and distribution of the data. Confirmatory data analysis utilises hypothesis testing, regression analysis, and multivariate techniques to validate hypotheses and make predictions based on the data.
5. **Inference**: Statistical inference allows researchers to generalise findings from a sample to a larger population, providing insights into broader phenomena. Inferential statistics leverages probability theory to estimate population parameters, assess the reliability of statistical estimates, and quantify uncertainty through confidence intervals and p-values.
6. **Probability**: Probability theory serves as the mathematical foundation of statistics, providing a framework for quantifying uncertainty and randomness in data. Probability distributions, such as the normal distribution, binomial distribution, and Poisson distribution, characterise the behaviour of random variables and enable probabilistic modeling and prediction.
7. **Decision Making**: Statistics informs decision-making processes by providing evidence-based insights and recommendations derived from data analysis. Decision analysis techniques, such as decision trees, sensitivity analysis, and risk assessment, help stakeholders evaluate alternative courses of action, anticipate potential outcomes, and mitigate risks in complex decision environments.
8. **Variability**: Statistical methods account for variability in data by distinguishing between systematic variation (e.g., trends, patterns) and random variation (e.g., noise, uncertainty). Understanding the sources and magnitude of variability allows analysts to identify significant trends and patterns amidst fluctuations and make reliable interpretations of the data.
9. **Interdisciplinary Application**: Statistics finds applications in a wide range of disciplines, including social sciences (e.g., sociology, psychology), natural sciences (e.g., biology, ecology), engineering (e.g., quality control, reliability analysis), economics (e.g., econometrics, market research), medicine (e.g., clinical trials, epidemiology), and environmental studies (e.g., ecological modeling, environmental monitoring).
10. **Continuous Development**: Statistics evolves in response to advancements in technology, data science, and computational methods, driving innovation and expanding its applicability in diverse domains. Emerging fields such as machine learning, data mining, and artificial intelligence (AI) intersect with traditional statistical methods to address complex analytical challenges and unlock new opportunities for knowledge discovery and decision support.
LIMITATIONS:-
Certainly, let's delve deeper into the limitations of statistics:
1. **Sampling Bias**: Sampling bias occurs when the sample selected for analysis does not accurately represent the entire population under study. This can lead to skewed results and inaccurate conclusions. For example, if a survey on smartphone usage is conducted only among tech-savvy individuals, the results may not be applicable to the general population.
2. **Assumption Dependence**: Many statistical methods rely on certain assumptions about the data, such as normal distribution or independence of observations. If these assumptions are violated, the validity of the statistical analysis is compromised. For instance, linear regression assumes a linear relationship between variables, and violating this assumption can lead to erroneous predictions.
3. **Causation vs. Correlation**: Statistics can identify relationships between variables, but it cannot establish causation. Correlation merely indicates a relationship between two variables but does not imply that changes in one variable cause changes in another. Confounding variables, unobserved factors, or reverse causation can confound the interpretation of statistical associations.
4. **Measurement Error**: Data collected for statistical analysis may contain errors or inaccuracies due to various sources such as measurement instruments, human error, or data entry mistakes. Measurement error can introduce noise into the analysis and distort the true relationships between variables, leading to biased estimates and erroneous conclusions.
5. **Overfitting**: Over-fitting occurs when a statistical model captures noise or random fluctuations in the data instead of underlying patterns. This can happen when the model is overly complex or when it is trained on a small dataset. Over-fitted models perform well on the training data but generalise poorly to new, unseen data, limiting their predictive accuracy.
6. **Interpretation Challenges**: Statistical results can be challenging to interpret, especially for non-experts. Complex statistical concepts, technical jargon, and mathematical formulations may hinder understanding and lead to misinterpretation or oversimplification of findings. Effective communication of results is essential to ensure their accurate interpretation and meaningful application.
7. **Ethical Considerations**: Ethical considerations arise in statistical research, particularly in studies involving human subjects. Issues such as privacy, informed consent, and data confidentiality must be carefully addressed to protect the rights and welfare of participants. Ethical lapses in statistical research can undermine the integrity of the study and harm individuals or communities involved.
8. **Limited Scope**: While statistics is a powerful tool for analyzing quantitative data, it may not be suitable for addressing all research questions or phenomena. Certain phenomena may defy quantification or require qualitative methods of analysis. Additionally, statistical analyses may overlook contextual factors or nuances that are critical for understanding complex social, cultural, or behavioral phenomena.
9. **Context Sensitivity**: Statistical results may vary depending on the context in which they are applied. Factors such as cultural differences, socioeconomic status, or historical events can influence the validity and generalizability of statistical findings across different populations or settings. Careful consideration of context is necessary to ensure the relevance and applicability of statistical analyses.
10. **Subjectivity in Interpretation**: The interpretation of statistical results can be subjective and influenced by the researcher's biases, assumptions, and prior beliefs. Confirmation bias, selective reporting, or data dredging can lead to cherry-picking results that support preconceived hypotheses or agendas. Transparency and rigor in statistical analysis and reporting are essential to mitigate the impact of subjective interpretation and ensure the credibility of findings.
WHY IS STATISTICS
1. **Data Exploration**: Statistics allows us to explore and understand data through various techniques such as data visualization, exploratory data analysis (EDA), and summary statistics. By visually representing data and calculating key metrics, we can gain insights into patterns, trends, and relationships within the data.
2. **Quantitative Research**: In fields such as psychology, sociology, and economics, statistics are essential for conducting quantitative research. Researchers use statistical methods to design experiments, collect data, analyze results, and draw conclusions about the phenomena they are studying.
3. **Risk Assessment and Management**: Statistics help in assessing and managing risks in various domains including finance, insurance, healthcare, and environmental science. Techniques like probability distributions, regression analysis, and Monte Carlo simulations are used to quantify risks, make predictions, and develop strategies to mitigate them.
4. **Policy Making and Planning**: Governments rely on statistical data to formulate policies, allocate resources, and plan for the future. Census data, employment statistics, GDP growth rates, and other economic indicators inform policymakers about the state of the economy and help them address social, economic, and environmental challenges.
5. **Validation and Generalization**: Statistical methods provide a framework for validating research findings and generalizing results to larger populations. Through techniques like hypothesis testing and confidence intervals, researchers can determine the reliability and significance of their findings and make inferences about the broader population from which the data was sampled.
6. **Experimental Design**: In scientific research and clinical trials, statistics are crucial for designing experiments and studies that yield valid and reliable results. Statistical principles help researchers control for confounding variables, randomize treatments, and ensure that experiments are properly powered to detect meaningful effects.
7. **Performance Evaluation**: In fields like education, healthcare, and business, statistics are used to evaluate the performance of individuals, organizations, and systems. Performance metrics, such as graduation rates, patient outcomes, and financial indicators, are analyzed statistically to assess effectiveness, identify areas for improvement, and make data-driven decisions.
8. **Quality Improvement**: Statistical process control (SPC) techniques enable organizations to monitor and improve the quality of products and services continuously. By collecting and analyzing data on process performance, organizations can identify sources of variation, implement corrective actions, and maintain consistent quality standards.
9. **Forecasting and Planning**: Businesses use statistical forecasting models to predict future demand, sales, and market trends. Time series analysis, regression analysis, and machine learning algorithms are employed to develop accurate forecasts, optimize inventory levels, and make strategic business decisions.
10. **Public Health and Epidemiology**: In public health, statistics play a vital role in monitoring disease outbreaks, assessing health disparities, and evaluating the effectiveness of interventions. Epidemiological studies use statistical methods to analyze health-related data, identify risk factors for diseases, and inform public health policies and interventions.
In summary, statistics serve as a powerful toolkit for understanding, analyzing, and interpreting data across a wide range of disciplines, driving evidence-based decision-making, innovation, and progress in society.