Wednesday, July 31, 2019

Hobbes and Locke Social Contract Theory Essay

Thomas Hobbes and John Locke are two of the most influential political philosophers of the modern age. Their ideas on political philosophy, among other ideas, have helped shaped the Western World, as we know it. One of the most important theories that the two have both discussed, and written in detail on, is the idea of the social contract. Social Contract Theory is the view that moral and/or political duties depend on a contract that leads to the formation of a civil society. Thomas Hobbes was the first person to come up with the idea of a social contract in his text, Leviathan. As with any concept in history, other political philosophers have used Hobbes’ theory as a stepping-stone. One of those men is John Locke, who presents a very different account of how it is civil society is formed. Although both men have very different accounts on the social contract notion, there are some similarities between the two. Before putting pen to paper Hobbes had a front row seat to a quintessential moment in early English history—the English Civil War. The war was a dispute between King Charles I and his followers, the Monarchists and the Parliamentarians. The Monarchists preferred the traditional authority of the king, while the Parliamentarians demanded more power for Parliament, England’s quasi-democratic institution. Hobbes is somewhere in between the two groups with his own set of views. Hobbes believed that political authority is based on the self-interest of the members of the society, all of who are considered equal. He argued that no single individual had the power to rule over the rest. He also backed the conservative point of view that the sovereign must have absolute authority in order for society to last without disruption. It is in the rejection of the Monarchist point of view, that Hobbes and Locke find their first similarity. Both authors sought out to refute the positions presented by Robert Filmer’s Patriarcha, regarding the issue of the Divine Right of Kings. Filmer believed that God gave absolute authority to the king. Since God gives the power to the king, political society focused on obeying God unconditionally. Although Hobbes did agree that it was necessary for a king to have absolute authority in order to keep the people in line, he believed that authority came from the people living in the community and not God. Locke’s most influential political writings come from his Two Treatises On Government. His First Treatise is focused almost entirely on rejecting Filmer’s theory. Locke is in line with Hobbes in his belief that political authority comes from the consent of the governed. Along with this similarity, both men also agree on the idea that those people in a State of Nature will willingly consent to coming together to form a political society. They also agree on the belief that people would live in fear of each other regardless of their ability to use reason. Human nature allows men to be selfish. All people have the natural right to defend their own life, liberty, health and property. This fear is what leads many people to come together and form a state so that there would be a central authority to protect their life, liberty, health and property of all people within society. Along with creating the outline for the social contract theory, Hobbes was also a major contribution to the idea of the State of Nature, a hypothetical situation used to show how people lived before the establishment of society. In the State of Nature, life was â€Å"solitary, poor, nasty, brutish, and short,† characterized by self-interest and the absence of rights and laws (Hobbes 89). Hobbes believed that man was fundamentally evil and required a central authority to keep them out of the conditions of the state of nature. Locke, on the other hand, saw individuals as exercising moral limits over their actions. In order to answer the question of why the people should be willing to submit to political authority, Hobbes uses the idea of a State of Nature. This is a completely hypothetical situation through which he imagines what life was like for men before the establishment of civil society. In the State of Nature, men are naturally and entirely self interested, resources are limited and there is no power that forces the people to follow the rules of society. Men are also considered equal to one another in that even the strongest man can be killed in his sleep. There is no ability for men to ensure the satisfaction of their needs and desires as humans, and no prolonged systems of cooperation among men. The state of nature is a state of constant fear and distrust, or as Hobbes puts it â€Å"a state of perpetual and unavoidable war† (Hobbes 90). Based on the previous definition of the State of Nature, it would seem that mankind is doomed for eternity. However, hope is not lost. Using the power of reason, they are able to understand the laws of nature, which lead man out of the state of nature and into civil society. A Law of Nature, (Lex Naturalis), is a Precept, or generall rule, found out by reason, by which a man is forbidden to do, that, which is destructive of his life, or taketh away the means of preserving the same; and to omit that, by which he thinketh it may be best preserved. (Hobbes 91) The first rule of nature is to seek peace when others are also willing to follow in the quest for peace, â€Å"That every man, ought to endeavour Peace, as farre as he has hope of obtaining it; and when he cannot obtain it, that he may seek, and use, all helps, and advantages of Warre† (Hobbes 92). In the pages leading up to the natural laws, Hobbes describes what it is that drives us to seek peace. â€Å"The Passions that encline men to Peace, are Feare of Death; Desire of such things as are necessary to commodious living; and a Hope by their Industry to obtain them† (Hobbes 90). These are the things that lead people out of the state of nature and into forming a political society. People want protection of their bodies, property, and commodious living. It is through reason that men are led to the construction of a Social Contract, allowing for a life better then in the State of Nature. The social contract is formed through the establishment of two other contracts. The first contract is that they must agree to establish society by renouncing the rights that they had in the State of Nature. The second is that they must choose a single person, or an assembly of people, that will have the authority to enforce the various parts of the contract. The sovereign has the power to punish those who violate the social contract, which leads people to adjust themselves to the rules of their society. In order to understand the purpose of the Social Contract, Hobbes sets forth a definition of a commonwealth, or civil society: And in him consisteth the Essence of the Commonwealth; which (to define it,) is One Person, of whose Acts a great Multitude, by mutuall Covenants one with another, have made themselves every one the Author, to the end he may use the strength and means of them all, as he shall think expedient, for their Peace and Common Defence. (Locke 121) Without a common power to exercise force, society would be the same as it was while in the State of Nature. The Social Contract is considered to be the fundamental source within society for all that is good, along with being the force that allows us to live well. On the opposite side of the spectrum is another major figure in political philosophy, Locke. Locke’s views are very different from that of Hobbes, besides the fact that Locke uses the State of Nature concept created by Hobbes. For Locke, the State of Nature is a state of complete and perfect liberty to live the best life possible, while being free from interference from others. We must consider what state all men are naturally in, and that is a state of perfect freedom to order their actions and dispose of their possessions and persons as they think fit, within the bounds of the law of nature, without asking leave or depending upon the will of any other man. (Locke 5) In this state of equality no person has any power over any one else—everyone is subjectively equal. â€Å"The state of nature has a law of nature to govern it, which obliges every one; and reason, which is that law, teaches all mankind who will but consult it that, being all equal and independent, no one ought to harm another in his life, health, liberty, or possessions† (Locke 5). The state of nature is not a state of license, or a state of authority, in that individuals have the ability to do whatever they want. Although this society is pre-government, morals guide the laws of nature. God gives the natural laws to commands and us that we do not harm others, since we are all equal in the eyes of God. For Locke, the State of Nature is more like a state of liberty that allows the people to pursue their own interests free from interference. It is considered a peaceful state because of the natural laws and its restrictions on the people. Hobbes saw the State of Nature as being a state of constant war, a drastic change compared to the views presented by Locke. Although Locke’s state is predominantly peaceful, there is potential for a State of War. According to Locke, the State of War starts between two or more people when one person declares war on the other. This is usually done by stealing something from the other, or trying to make another man a slave. Since there is no central power to mediate the dispute and the laws of nature allow for self-defense, people are allowed to kill anyone that brings force against them. Without a force to mediate, the duration of wars is much longer and more brutal. Political societies form when men come together in the State of Nature, and agree to give up their power to punish those who disobey the laws of nature and give that power to a central authority. It is through this that the people consent to the will of the majority. Through leaving the state of nature and forming a society, the people create a â€Å"one body politic under one government† and are thus subjected to the will of that particular â€Å"body† (Locke 55). The only way for one to become part of society is through our own individual consent, meaning we cannot be forced to join the society. By joining a society, people gain a few things that they lacked in the State of Nature. These aspects consist of laws, a judge to settle disputes regarding laws and most importantly an executive power to enforce the law. The executive power is created for the protection of the people, their property and general well being. It is when this protection is non-existent, or the King becomes a tyrant by acting against the interest of the people, that the contract can be thrown away. It is with this that the process of establishing a social contract can begin once again, and also the power. Both Hobbes and Locke agree on the fact that people living in a state of nature will come together to form a contract amongst themselves, which ultimately leads to the establishment of society. Both Hobbes and Locke also agreed that people living in a state of nature would be living in a constant state of fear of one another before society is established. Hobbes has a much darker view of Human Nature, seeing them as inherently evil, while Locke viewed man as being guided by â€Å"rational self-interest† with the ability to self-govern without the Leviathan watching over you. These two figures have helped shape our modern systems of government among many other things.

Tuesday, July 30, 2019

An Analysis of 13 Days

Professor O’Neill Atlantic Worlds II April 16th 2010 Characterizing the First World War as an epidemic of miscalculation, President John F. Kennedy pondered, â€Å"they somehow seemed to tumble into war †¦ through stupidity, individual idiosyncrasies, misunderstandings, and personal complexes of inferiority and grandeur† (49). Reflecting upon these miscalculations, Robert F. Kennedy’s Thirteen Days documents the Cuban Missile Crisis and catalogues the President’s contemplative action amidst potential disaster.Considering the misjudgment that drove conflict in the early twentieth century, and the socio-technological paradigm shift of war, President Kennedy found remedy in the maintenance of open channels of external communication, while regarding the international domino effect of each action, and exhibiting constant skepticism in pursuit of a peaceful resolution. German sociologist Max Weber wrote of the Great War, â€Å"this war, with all its ghastl iness, is nevertheless grand and wonderful. It is worth experiencing† (EP 768).Embellishing the heroism of warfare, Weber reflects a common acceptance of war in the early twentieth century as one of sport and necessity. However, with the development of nuclear arms came a paradigm shift concerning war and its role amid international powers. Acknowledging the destructive potential of nuclear warfare, Kennedy adamantly stated, â€Å"We were not going to misjudge or challenge the other side needlessly, or precipitously push our adversaries into a course of action that was not intended† (75).Using historical precedent as his guide, President Kennedy acts upon the belief that war is rarely intentional, while also recognizing the evolving dynamic of war as one of an arms struggle. The application of this lesson exists in Kennedy’s resolution to utilize quarantine as opposed to armed conflict at the Soviets Union’s initial threat. Foreign ships given orders to re treat would be afforded such an opportunity, any vessel refusing to stop would have its rudders disabled to avoid loss of life, and ships not belonging to the Soviet Union were the irst and only to be boarded, as to not incite a military response. Executing such action demonstrates the President’s clear understanding of past misjudgment, and the paradigm shift that now characterized war as something not of sport, but of mass destruction. Robert Kennedy reaffirms such in declaring, â€Å"If we erred, we erred not only for ourselves and our country, but for the lives of those who had never been given an opportunity to play a role† (81). This statement epitomizes the overwhelming burden of nuclear war, and the cognizance necessary to avoid it.Vital to the avoidance of miscalculation and the development of a mutual understanding were open channels of communication during the Cuban Crisis. President Kennedy recognized the importance of consistent communication to evade impu lsive action, and promote logically sound decision-making. Such an example exists in Robert Kennedy’s Thirteen Days in which Soviet Chairman Khrushchev and President Kennedy exchange messages outlining the guidelines towards peaceful resolution. We must not succumb to petty passions, or to transient things, but should realize that if indeed war should break out, then it would not be in our power to stop it, for such is the logic of war† (66). Stated by Khrushchev in pursuit of mutual amity, such communication demonstrates the importance of clarity and transparency under desperate circumstances. This quotation further exhibits recognition of the warped nature of warfare, and acknowledges history’s wrongdoings that provoked destruction.President Kennedy concluded deliberations in stating, â€Å"the effect of such a settlement on easing world tensions would enable us to work towards a more general arrangement †¦ the United States is very much interested in red ucing tensions and halting the arms race† (79). The clear and concise nature of this exchange lends praise to the diplomatic nature of Kennedy’s tactics, providing both the United States and Soviet Union with the opportunity to ultimately avoid nuclear holocaust. The snowball effect exhibited through the First World War demonstrates the danger of tumbling into conflict through allied obligation and diplomatic stupidity.President Kennedy’s ability to tactfully neutralize the Cuban Crisis demonstrates an awareness of that danger, and an appreciation for the international domino effect that warfare would generate between nations. Strongly stated by Robert Kennedy, â€Å"we had to be aware of this responsibility at all times, aware that we were deciding for the United States, the Soviet Union, NATO, and for all of mankind† (75). Such concern for the global repercussion of warfare can be observed in President Kennedy’s constant scrutiny of military recom mendations and their effect upon the entire western hemisphere.Seeking alternative solutions to war as well as the approval of global powers, Robert Kennedy further states, â€Å"we were able to establish a firm legal foundation for our action under the OAS charter, and our position around the world was †¦ unanimously supported for a quarantine† (40). This diplomatic strategy, founded upon the support of strong European and American allies, aided the United States in considering the implications of all possible courses of action as to ensure a promise of peace for themselves and the global community.A final strategy, central to the diplomatic triumph of the Cuban Missile Crisis, refers to the establishment and success of Kennedy’s Executive Committee of the National Security Council. While each proposed solution held inherent weaknesses, this committee would allow for constant deliberation, argument, and debate. The ability to scrutinize each proposal reinforced a reasonable decision-making process, thereby diminishing the risk of the impulsive miscalculation or misjudgment that had prompted war only decades earlier.Embodying the significance of the Executive Committee, Robert Kennedy declares, â€Å"everyone had an equal opportunity to express himself and to be heard directly. It was a tremendously advantageous procedure that does not frequently occur within the executive branch† (36). Furthermore, President Kennedy is reported to have gone through â€Å"considerable lengths to ensure that he was not insulated from individuals or points of view because of rank or position† (89).While such an arrangement seems idealistic, President Kennedy’s recognition of all available viewpoints provided an extremely broad base of knowledge upon which to draw conclusions. It was this open-minded and reasonable approach that was heavily lacking prior to the Great Wars of the early twentieth century, thereby leading to global disasters t hat may have been averted under more logical circumstances. The measures taken by President Kennedy, as presented through Robert Kennedy’s Thirteen Days, lend overwhelming praise to his diplomatic triumph during the Cuban Missile Crisis.His success is reported as being founded upon the miscalculations of history, and a correction of those past errors in pursuit of peaceful relations. However, the idealistic manner in which the President’s actions are portrayed reek of both brotherly admiration and posthumous praise. Such a utopian presentation only serves to diminish President Kennedy’s heroic role amidst the crisis, and leads the reader to question how pivotal his leadership actually was. A personal memoir of Robert F. Kennedy, Thirteen Days must be read with a grain of salt to properly assess its validity as a historical record.While the President certainly acknowledged the socio-technological paradigm shift of modern warfare in addition to the stupidity from which the First World War emerged, Thirteen Days most obviously dismisses crucial events preceding the Cuban Missile Crisis. The Bay of Pigs Invasion, an unsuccessful attempt by American-trained Cuban refugees to overthrow the government of Fidel Castro, completely contradicts President Kennedy’s supposed cognizance of the dangers of nuclear war and impulsive military action.The failed invasion, initiated only three months after President Kennedy’s inauguration, humiliated the Administration and made communist nations distrustful of the United States. In addition, John F. Kennedy is consistently praised throughout his brother’s memoir for welcoming the viewpoints of not just government administrators, but regular people. For example, â€Å"he wanted the advice of his Cabinet officers, but he also †¦ wished to hear from Tommy Thompson† (89). However, not once throughout Kennedy’s memoir does he mention speaking to field soldiers or their names. The generic label of Tommy Thompson reduces the author’s faith in such sources, and President Kennedy is even shown to mock military figures in stating, they â€Å"lacked the ability to look beyond the limited military field† (90). Such evidence cannot be overlooked in determining the validity of President Kennedy’s success, and reduces the objectivity of this historical source. However, the ultimate success of President Kennedy’s historical reflections and peace-seeking measures cannot be denied.While Thirteen Days nearly emits audible applause for his actions, it accurately reports the measures taken to subdue the Cuban Missile Crisis, the effectiveness of quarantine and the importance of bargaining and communication. These actions, prompted through the establishment of the Executive Committee, resulted in the removal of nuclear arms from Cuba and the reestablishment of the global status quo. Rescuing our nation on the brink of nuclear war, the reader h as no choice but to close Thirteen Days with a deeper admiration for the courage and wisdom of President Kennedy.

Monday, July 29, 2019

The Brothers Karamazov by Dostoevsky

Fyodor Mikhailovich Dostoevsky, born in 1821, was a great Russian prose writer. He was born in Moscow and studied at the St Petersburg Engineering Academy. His first published work was a translation of Balzac’s Eugenie Grandet, which appeared in 1844. Two years later his first original works, the short stories Poor Folk, The Double were published, later followed by other short prose pieces.(Leatherbarrow, 47-48) In April 1849 Dostoevsky was arrested for suspected revolutionary activity and condemned to death, or at least was taken to the scaffold and to the last moments before execution before the true sentence of four years in prison and four years as a private in the Siberian army was read out. He was released from the army in 1858. The result of his imprisonment was the change of his personal convictions: he rejected the socialism and progressive ideas of his early years, and instead adhered to the principles of the Russian Orthodox Church and belief in the Russian people.A nother immediate fruit of his imprisonment experience was his remarkable House of the Dead that appeared in 1861. Other novels followed which display a profound understanding of the depths of the human soul. Notes from the Underground of 1864 sets rational egoism, which proffers reasons for treating others as instruments, against irrational selfishness which treats others as enemies. Crime and Punishment of 1866, The Idiot of 1868, and The Devils (also translated as The Possessed, written in 1871) led up to his great achievement, The Brothers Karamazov, completed in 1880.With the Slavophils, Dostoevsky venerated the Orthodox Church, and was deeply impressed by Staretz Amvrosy whom he visited at Optina. (Leatherbarrow, 169) But his sense of goodness was neither facile nor naive. He saw human freedom as something so awesome that most people are ready to relinquish it. This is epitomized in the Legend of the Grand Inquisitor. In his speech accepting the Nobel Prize for Literature, Solz henitsyn quoted Dostoevsky, ‘Beauty will save the world. ’ The Brothers Karamazov is Dostoevsky’s final novel, completed only two months before his death.It was intended as Dostoevsky’s apocalypse. Its genre might best be called Scripture, rather than novel or tragedy. (Bloom, 5) This novel is the synthesis of Dostoevsky’s religious and philosophic search. The scene of the novel is laid in a sleepy province in the family of the noble, the Karamazovs. A sleepy province had always been for Russian writers the source of characters of integrity, pure passion and spiritual relations among people. However, Dostoevsky presents the life in such province in different light. Spiritual decay had penetrated into patriarchal up-country.From the very early stages of the novel’s writing Dostoevsky underwent several influences. The first was the profound impact the Russian philosopher and thinker Nikolai Fyodorov had on Dostoevsky at this time of his life. A ccording to Fyodorov’s doctrine Christianity is a system in which â€Å"man’s redemption and resurrection could be realized on earth through sons redeeming the sins of their fathers to create human unity through a universal family. † (Sandoz, 221) The tragedy of patricide in The Brothers Karamazov acquires more poignant coloring as Dostoevsky applies a complete inversion of this Christian system.Thus the sons in the novel do not attain resurrection for their father. Quite to the contrary they are complicit in his murder, and such turn of events is for Dostoevsky a metaphor for complete human disunity, breakage of the mentioned spiritual relations among people. As already noted religion and philosophy played a vital role in Dostoevsky’s life and in his novel in particular. Nevertheless, much more personal tragedy changed the way the novel took later. In 1878 Dostoevsky stopped writing the novel because of the death of his son Alyosha who was only three-yea rs old.This tragedy was even more difficult to endure for the writer as Alyosha’s death was caused by epilepsy, a disease he inherited from his father. Dostoevsky’s desolation could not escape being reflected in the novel; one of the characters has a name Alyosha. The writer endued his character with the features he himself aspired to and would like to follow. Though very personal experience had a profound influence on Dostoevsky’s choice for theme and actions that dominated the external of the novel, the key problem treated by this work is human disunity, or breakage of the spiritual relations among people.In comparison to previous novels social split-up is accruing, getting more distinct the relations between people are becoming more fragile in The Brothers Karamazov. â€Å"For everyone nowadays strives to dissociate himself as much as possible from others, everyone wants to savour the fullness of life for himself, but all his best efforts lead not to fullnes s of life but to total selfdestruction, and instead of ending with a comprehensive evaluation of his being, he rushes headlong into complete isolation.For everyone has dissociated himself from everyone else in our age, everyone has disappeared into his own burrow, distanced himself from the next man, hidden himself and his possessions, the result being that he has abandoned people and has, in his turn, been abandoned. † (Dostoevsky, 380) This is how the situation with the Russian society of the 1870s is defined by the novel character, Starets Zosima, who is especially close to the writer. The Karamazovs family in Dostoevsky’s novel is Russia in miniature – it is absolutely deprived of warmth of family ties.Unvoiced hostility relates the father of the family, Fyodor Pavlovich Karamazov, and his sons: the eldest – Dmitry – the man of spoiled nature, Ivan, the captive of loose manners, Pavel Fyodorovich Smerdyakov, a child of shame, lackey by his posit ion and in his soul, and a novice Alyosha, who is making his best to reconcile hostile clashes that finally resulted in a dreadful crime of patricide. Dostoevsky shows that all participants of this drama share responsibility for the tragedy that had happened, and first of all, the father himself, who is, for the author, the symbol of decay and degeneration of human person.The contemporary society thus was infected with a serious spiritual disease – â€Å"karamazovshchina†. The essence of â€Å"karamazovshchina† lies in the denial of all sacred things and notions that sometimes ranges up to frenzy. â€Å"I hate the whole of Russia, Marya Kondratyevna. † – confesses Smerdyakov. – â€Å"In 1812 Russia was invaded by Emperor Napoleon 1 [†¦] and it would have been an excellent thing if we’d have been conquered by the French; [†¦] Everything would have been different. † (Dostoevsky, 281-282) The same Smerdyakov â€Å"As a child [†¦] had loved to string up cats and then bury them with full ceremony.He would dress up in a sheet, to represent a chasuble, and chant while swinging some imagined censer over the dead cat. † (Dostoevsky, 156) â€Å"Smerdyakovshchina† is the lackey variant of â€Å"karamazovshchina† and it demonstrably uncovers the essence of this disease – perverted passion for expressing humiliation and desecration of the most sacred values of life. As it is said in the novel â€Å"'people do love the downfall of a righteous man and his degradation'†. (Dostoevsky, 415) The main bearer of â€Å"karamazovshchina† is Fyodor Pavlovich who enjoys constant humiliation of the truth, beauty and good.His carnal relation with a foolish Lizaveta Smerdyashchaya, the result of which is the lackey Smerdyakov, is a cynical desecration of love. Fyodor Pavlovich’s voluptuousness is far from being a mere animal instinct and unconscious behavior. His volupt uousness has an idea to engage in controversy with the good. Karamazov is quite conscious of meanness of his intentions and deeds, and so he derives cynical satisfaction in humiliation of the good. He is always longing for spiting upon a sacred place.He consciously makes a row in Starets Zosima’s cell and then goes with the same intention to the abbot to dinner: â€Å"He wanted to take revenge on everyone for his own tricks. [†¦ ] I can’t hope to rehabilitate myself now, so I’ll spit in their faces and be damned! I’ll not be ashamed of myself in front of them and that’s that! ’† (Dostoevsky, 109) A distinctive feature of â€Å"karamazovshchina† is a cynical attitude towards the nation’s bread-earner – Russian farmer: â€Å"The Russian people need thrashing† (Dostoevsky, 282).According to Karamazov’s psychology all higher values of life has to be overridden, dragged through the mud for the sake o f frantic self-affirmation. There is a father Therapon living together with the saint Starets Zosima in a monastery. Outwardly this man is striving for the absolute â€Å"righteousness†, he leads an ascetic existence, exhausts himself with fasts and prayers. But what is the source of Therapon’s righteousness? What is its inducement? As it turns out then, his inducement is the hatred to Starets Zosima and desire to surpass him.Katerina Ivanovna is very kind to her offender, Mitya, all because of smoldering hatred to him and of a sense of wounded pride. The virtues turn into delirious form of self-affirmation, into magnanimity of selfishness. With the same selfishness and same magnanimity Grand Inquisitor â€Å"loves† humanity in a tale contrive by Ivan. In the world of Karamazovs all relations among people are perverted, they acquire criminal character since everyone here is trying to turn those around into â€Å"marble pedestal†, the pedestal for one†™s selfish ego.The world of Karamazovs is the world intersected by the crime chain reaction. Which one of the sons is father’s killer? Ivan did not kill, however, this is he who first formulated the idea of permissibility of patricide. Dmitry didn’t kill Fyodor Pavlovich either; he teetered on the brink of crime in a fit of hatred to his father. Fyodor Pavlovich was killed by Smerdyakov, but he only brought to an end Ivan’s ideas and passion that overfilled Dmitry’s embittered mind. In the world of Karamazovs the definite moral boundaries of crime cannot be restored – everybody is, to certain extent, guilty of murder.Potential delinquency reigns the atmosphere of mutual hatred and exasperation. Every person individually and all people together are guilty, or as Starets Zosima says â€Å"As to every man being guilty for everyone and everything, quite apart from his own sins. † (Dostoevsky, 379) â€Å"Remember especially that you may not si t in judgement over anyone. * No man on this earth can sit in judgement over other men until he realizes that he too is just such a criminal as the man standing before him, and that it is precisely he, more than anyone, who is guilty of that man’s crime. † (Dostoevsky, 402)â€Å"Karamazovshchina†, according to Dostoevsky, is a Russian variant of the disease, suffered by the all European societies; this is a disease of civilization. Its reasons are the loss of moral values by a civilized man and the sin of â€Å"self-worshipping†. The upper classes of Russian society, following the progressive classes of Western European society, worship their ego and consequently decay. The crisis of humanism comes, which in Russian conditions acquires forms which are particularly undisguised and defiant: â€Å"If you want to know, – argues Smerdyakov, when it comes to depravity there’s nothing to choose between them and us.They’re all blackguards, bu t there they walk about in patent leather boots while our scoundrels go around like stinking beggars and don’t see anything wrong in it†. (Dostoevsky, 282) By Ivan Karamazov’s formula: â€Å"for if there is no God, how can there be any crime? † (Dostoevsky, 395). The sources of Western European and Russian bourgeoisie were considered by Dostoevsky to be not in economic development of society but rather in the crisis of modern humanity, caused by â€Å"strenuously self-conscious† individual. (Lambasa et al., 118) Thus it can be concluded that Karamazov’s decay, according to Dostoevsky, is the direct implications of isolation, solitude of a modern civilized man, it is the consequence of people’s loss of feeling of great universal relation to the secular and divine world that is superior to the animal needs of human earthy nature. Repudiation of the higher spiritual values may bring a man to indifference, loneliness, and hatred to life. T his is the path kept by Ivan and Grand Inquisitor in the novel. Works Consulted Bloom, Harold.Fyodor Dostoevsky’s the Brothers Karamazov. New York: Chelsea House, 1988. Dostoevsky, Fyodor. The Karamazov Brothers. Trans. Ignat Avsey. New York: Oxford University Press, 1994 Lambasa, Frank S. , Ozolins, Valija K. , Ugrinsky, Alexej. Dostoevski and the Human Condition after a Century. New York: Greenwood Press, 1986. Leatherbarrow, W. J. The Cambridge Companion to Dostoevskii. Cambridge, England: Cambridge University Press, 2002 Sandoz, Ellis. Political Apocalypse: A Study of Dostoevsky’s Grand Inquisitor. Wilmington, DE: ISI Books, 2000.

Love Module Two Essay Example | Topics and Well Written Essays - 750 words

Love Module Two - Essay Example In order to solve the main problem (financial constraints), the company weighs the following options: joint venturing, borrowing and cutting the internal costs. The best solution of the three alternatives is to create a network of the clinics which were operating on the nearby. This is because; by having multiple clinics reduces the risk factors associated with financial constraints. The last case pertain the former CEO of Autumn Park disability. Mildred was complaining that the company was discriminating her because of her disability. The current CEO Douglas had to find a way out in order to get rid of her. First, Douglas gave her a copy to check in 30 days the way she was allocated time for her care. Secondly he visited CCRC in order to seek advice and lastly, the company had a dialogue with Ombudsman in order to look for a solution (Thomson and Robert 1987). The solutions I would have proposed to the Chief Executive Officer of the company are: to seek court advice, the executives should try to seek help from Mildred’s family members, and the last solution which I would propose is to remove her by force. The alternatives would be applicable to the company so that the company can get rid of Mildred who had turned to be stubborn to everyone in the company. The solutions would provide a permanent solution to the

Sunday, July 28, 2019

Harvard Business Review Article Essay Example | Topics and Well Written Essays - 2500 words

Harvard Business Review Article - Essay Example Following seven key lessons will emphasize on the significant aspects of effective strategic leadership as learnt and comprehended by a student. Key Lesson 1: Maintaining Effective Communication Communication of the organization's purpose is one of the most vital aspects to be considered. A corporate organization's purpose of being is what it stands for and gives an overview of the main aims it wants to achieve. So, it is essential for effective strategic leadership of a corporate organization to state clearly its mission and vision. This can be explained lucidly by the mission statement of Google, which is to organize all the information in the world in a systematic manner so that it can be accessed and utilized universally. Moreover, the brand name of a company should be a direct reflection of its core values, so that the customers would be aware of the services being provided and being attracted to avail them at the same time. Richard Reed, Co-founder of Innocent, did an exceptional job at coming up with the brand name. Innocent; which is one of the most recognized and acclaimed brand in the UK, produces healthy fruit juices and food items only. The company's name, Innoce nt goes hand in hand with the word natural, which depicts the very nature of innocent products. Key Lesson 2: Sustaining Competitive Market Advantage Another important lesson is to sustain competitive advantage in the market over the passage of time. Skimming, price war, and predatory pricing etc. are some of the marketing techniques which could be utilized by the company to sustain the advantage or to drive their competitors out of the market completely. If this upper-hand advantage is not achieved and sustained, a company could face serious consequences, which reflect adversely on its strategic leadership policies. The downfall of PepsiCo is one of the examples of this scenario as Indra Nooyi, the current chairman and CEO of PepsiCo, is facing a loss of market shares and decreasing stock values due to failure at sustaining competitive advantage with rival beverage company Coca Cola. In an announcement made in February, 2012, it was revealed that due to its financial concerns, PepsiCo is cutting down 8,700 employees. The company is starting new lines of diet soda and sugar free products to increase the customer interest but this strategy has not been very successful due to the launch of similar product lines from rival companies. PepsiCo is trying to devise new product lines in order to obtain competitive market advantage so that it can be successful in the market again. Indra Noooyi is also feared to be terminated from her current position of the CEO of the company if the financial position of the company does not stabilize. Key Lesson 3: Effective Change Management Faslane Naval Base was run entirely by the Ministry of Defence of UK and the Royal Navy up until 2002. Afterwards, a contract was signed with Babcock Marine to reduce the cost and to improve operational efficiency. The staff had to work under a different managing authority with new aims and futuristic goals. John Howie, director of Babcock Marine, set forth the emphasis on delivering services to the Navy, eradicating the previous goals of focusing on building infrastructures. Effective strategic leadership could never be achieved without effic ient and timely change management in the organization. Faslane is an organization where the only assets are the people, so to implement change, the shackles which bound the workers, had to be eliminated. Starting with low level managerial changes, the company moved towards gaining efficiency by re-engineering fundamental processes to

Saturday, July 27, 2019

New Keynesian Model Essay Example | Topics and Well Written Essays - 1000 words

New Keynesian Model - Essay Example A major advantage of the NKPC compared with the traditional Phillips curve is said to be that the latter is a reduced-form relationship; whereas, NKPC model has a clear structural interpretation so that it can be useful for interpreting the impact of structural changes on inflation (Gali and Gertler 1999). The key New Keynesian models of incomplete nominal adjustment Dynamic Stochastic general equilibrium (DSGE) is a new Keynesian economic model whose foundation is hinged on the microeconomics elements. The key purpose of the DGSE model is to integrate monetary policies and theories with real business cycles impacting the economies. The model acknowledges and specifies preferences of economic agents such as individuals and firms who wish to maximize utility ad profits respectively. The DGSE model depends on the current choices of economic agents to predict future economic outcomes. It also allows stochastic disruption on the technology of production and applies the competition princi ple to compute equilibrium price and quantities under the function of preferences, tastes, technology and random shocks (Geweke 2009). There are many assumptions that are made in the DSGE model. The first assumption is that the model relies on complete markets. Complete markets allow competitive monopolistic economic agents (firms) to set prices in response to market conditions. The set prices cannot be adjusted instantly without incurring some additional costs. Second assumption is that prices and wages are sticky. Economic processes are influenced by various factors that delay price and wage adjustments making it difficult to attain full equilibrium. Such factors include failure of firms to reduce prices even if marginal cost decreases in order to increase their level of profits. If demands fall, firms are likely to hold prices constant and reduce production rather than reduce the prices of goods or services. Thirdly, the model assumes that economic agents are rational. This means that economic agents choose appropriate consumption paths that maximize utility and production paths that maximize profits. Fourthly, resources are fully utilized in each period. This means that there are no resources spilling to the next budget period. Fifth, input decisions are determined by people who decide how much time they work, the quantity of goods and services they consume as well as the amount of income they save and invest in line with costs associated to those decisions. Sixth, the economy is closed. This indicates that they are no international goods or services that flow in or out of the economy. Seventh, money markets do not exist in the economy. Finally, the eight assumption of DGSE model is that people know policies that affect them a next in advance. For example, people know the exact tax policy that affects them in the coming year. These are policies that are likely to be sustained though they are likely to experience stochastic disturbances. The model takes int o consideration random shocks such as technological change, fluctuations in price of oil and errors in macroeconomic models. Though the model is considered superior, it has been criticized that it was not useful in analyzing the financial crisis of 2007-2010. It is also considered as too stylish

Friday, July 26, 2019

Hamlet (A Critical Analysis) Essay Example | Topics and Well Written Essays - 1000 words

Hamlet (A Critical Analysis) - Essay Example Hamlet never intended to kill Claudius before his father’s ghost appeared and urged him to do so. In fact, it was his internal battle (introspection) that didn’t let him succeed in deciding what he wanted to. There are different theories about Hamlet’s delay in taking revenge. It is believed that he was afraid of being what he was accusing Claudius for; the murderer. Because murdering Claudius would not have made him better than him. In five brother’s story he experienced the darker side of revenge. He wanted to be quick in taking action but was cautious enough to identify emotion and illogical thought attached to it (Westlund 244-256). The story of five sons of a murdered man is narrated within Hamlet. All of sons have their own interpretation and way of taking revenge. Among all of them, Hamlet’s understanding and way of taking revenge is portrayed to be the most balanced one. Neither he acts promptly nor stays completely inactive, rather waits for the right time to act (Rasmussen 463). Hamlet is being criticized for delaying revenge due to his procrastinating nature or belief not to murder a disarmed man as Hamlet rationalize his decision in speech. Another reason for the delay can be the fear of destroying a man (Claudius) who was experiencing a spiritual awakening. It may be true because, according to McCullen the idea of revenge in Elizabethan era required spiritual and physical destruction for absolute revenge (24-25). Thus, Hamlet quit the idea of killing Claudius while he was praying. Hamlet wants to wait for the right time. It does not reflect his being inactive or a procrastinator because strangely, he was active enough to murder Polonius and two spies, Rosencrantz and Guildenstern. Besides being two of the most crucial characters, Shakespeare keeps readers unaware of Ophelia and Gertrude’s feelings and inner

Thursday, July 25, 2019

International Perspectives In Organisations Literature review - 1

International Perspectives In Organisations - Literature review Example The next sections of the discussion will emphasize on the constraints of different global cultures in the theoretical approach of management. The study will further illustrate the cross-cultural differences and its impact in management and leadership. The last section of the critical analysis will elaborate the effect of business downsizing on the organisational behaviour and sustainability. The argument of this article is based on the responsibilities of international corporations and organisations on the global population. In this argument Arnold (2012), has elaborated and defended the views that are supporting and promoting the role of transnational corporations and business entities as the agent of justice for the base of economic pyramid. This argument has explained the role of corporations in the reduction of global poverty and inequality. The author has also discussed two separate perspectives of normative legitimacy that support the role of corporations and other business entity as the agents to promote global justice (Arnold 2012). The first perspective focuses on the normative legitimacy of different international institutions which are responsible for governing various international trade regulations and business activities. Though this domain of normative legitimacy has accrued huge attention of political and legal scholars, the author of the article has pac ifically focused on the second perspective which evaluates the legitimacy of corporations in regards to its activities within global societies. This argument has detailed the importance of ethical and moral legitimacy of corporations in regards to their global activities (Buchanan and Keohane, 2006). According to the view of Palazzo and Scherer, (2006) moral and ethical legitimacy of organisations is not dependent on the legal and political norms but it is influenced by the deliberative communication process. During the explanation of the

Wednesday, July 24, 2019

Response Question Essay Example | Topics and Well Written Essays - 1250 words

Response Question - Essay Example a writing style which means that Sima Qian wrote as a group. Sima Qian observed the whole picture of the events in China, and wrote it from the standpoint of the group’s view. On the other hand Herodotus wrote it mostly from his view point, adding up details which he thought were necessary for the readers to read. What did the group think of what happened? What did the group think was important to write down? Why did they think it was important? These are questions that we should answer. The most amazing thing about these authors is that they never once met each other in person. Yet, for people who never met each other, never enjoyed each other’s company and never had a cup of coffee or tea over the dinner table together, their recollections and styles are vastly similar. The similarities become eminent to the readers by the different accounts of the writers that they give, and the brilliant ways in which they speak. Herodotus’ brilliant account of politics was f ascinating to me, as was Qian’s ability to go off on random mythical journeys. Many would say this was separate, but it is also identical as both writers could go off onto one stretch of writing style and stay there. Regardless of the similarities that the two writers share their writing styles are vastly different. However the fact remains that Sima Qian and Herodotus are great writers and perhaps some of the best writers in history. Their writing style shall be forever analyzed and young students and emerging writers will always try and strive for their great style. A major difference that the two authors have is the manner in which each writer presents his personal interpretation. As mentioned above Herodotus tends to go off on political tangents while Sima Qian tends to keep the political thoughts to a barely noticeable minimum. However Qian has no problem in exploring the mystical world of China, something Herodotus virtually leaves untouched. Qian wrote from what is kno wn as the group standpoint. He wanted everyone to receive his entire message, to see his whole picture, so to speak. It is a contrast to be sure from Herodotus, who wrote from his standpoint and more about what he thought, and wasn’t as worried about making sure that the entire group got the picture. However this does not means that readers of the text will not find Herodotus’ work compelling. The first thing I noticed about both authors, as I was reading their work, was how stunningly easy the work was to read. Many history books are dry and full of only factual writing. However in my opinion this time the writing from the history book seemed to come to life. Although I liked Herodotus’ approach slightly better because I am more of a political person than a mystical person, yet Qian’s work was also equally engaging. However I did think that Herodotus’ work was more intellectually stimulating perhaps because it had a more historical and more fact-b ased approach. Many books, especially books about this time period in history, are quite boring although I was glad to see that this one was very interesting. I found Herodotus’ work to be more enlightening than Qian’s, perhaps because I was able to relate to the style of writing that he used and it was slightly better than Qian’s. Sometimes it was hard to know who actually wrote what document as the book did not always specifically mention it, thus making an absolute and fair analysis and comparison challenging, to say the least. I personally could not have done all the research and

Tuesday, July 23, 2019

DuPont Essay Example | Topics and Well Written Essays - 1250 words

DuPont - Essay Example DuPont was not able to adopt right away but because of the need to make steps to cope up with the competition, DuPont introduces the new stainmaster carpets and it become one of the most popular commercial in television. For most of its history, the residential segment had been the most laid back segment of the carpet market. Styles tended to be simple, colors passive, and features uniform across all competitors in the industry. Technically, industry players maintained that differences did indeed exist, but in the words of one industry analyst, "The differences were there in style and fiber quality, but the housewife out shopping for carpet didn't really know or care-she only liked what she could see and feel." As such, DuPont wondered if the styles and designs so popular in the commercial segment could be transferred to the residential segment. Was the average household willing to make carpeting more than just a backdrop for other furnishings If the program were to be successful, it would mean several things: further differentiation from other nylon. The decision of the company adapting to a new technology has been a good move but it also become a decision problem. The company has come up with question that can be answer by a good marketing strategy. Although the advertisement in TV has been successful the consumer usually doesn't care about the style and fiber quality instead they liked what they could see and feel. Problem Definition Defining the problem is the single most important step in the market research process. A clear statement of the problem is a key to a good research. A firm may spend hundreds or thousands of dollars doing market research, but if it has not correctly identified the problem, those dollars are wasted. In our case it is obvious that the problem here is that, will the new stainmaster carpet will be as successful as their original carpet. But even if this is clear, you still need to know what exactly you need to know to make the new approach a success and what specific related to the product is difficult to find out. Problems that may be encountered are: it is unknown what potential markets there are, what customer groups are interested in your products, who the competitors are After formulating your problem, you need to formulate your research questions. What questions need to be answered and which possible sub-questions do you have. Dupont wondered if the styles and designs so popular in the commercial segment could be transferred to the residential segment. Was the average household willing to make carpeting more than just a backdrop for other furnishings With the problem or opportunity defined, the next step is to set objectives for your market research operations. Research objectives, related to and determined by the problem formulation, are set so that when achieved they provide the necessary information to solve the problem. Research, Design and Methodology Design is the activity involved in the development of an artifact or part of artifact from idea of manufacturing hand-off. Essentially, design is about fulfilling the human needs and particularly through the link of innovation, can make a real difference to quality of life. In DuPont, they have welcomed the new technology by introducing their new product line which is the Stainmaster carpet. It has been successful in the television ad but was not really able to penetrate the household. The company should be able to tell

Analyze and Evaluate the Federal Legislative Process Essay

Analyze and Evaluate the Federal Legislative Process - Essay Example We will get to understand this process even better by looking at the stages the family smoking prevention and tobacco control act, passed through till it was enforced as a law. We will also get to know the content of the bill and its importance. The family smoking prevention and tobacco control act, pub.l.111-31 H.R.1256 This is one of the major statutes in the federal government enacted during President Obama’s time .It came into effect on June 22, 2009 (Encyclopedia). The act gives the food and drug administration the power to regulate the tobacco industry. A signature element of the law imposes new warnings and labels on tobacco packaging and their advertisements, with the goal of discouraging minors and young adults from smoking. The Act also bans flavored cigarettes, limits on the advertising of tobacco products to minors and requires tobacco companies to seek FDA approval for new tobacco products. Legislative process Bill introduction and The First Reading According to ( Freeman), bill introduction and first reading is the initial stage in the legislative process. ... The family smoking prevention and tobacco act was introduced to respond to the decision, which had held that the Clinton administration's FDA had gone beyond its Congressionally delegated authority, thus giving the FDA the authority the Court determined it had lacked. The bill was passed by a vote of 298 to 112. Second Reading and Referral of the Bill to a Committee On May 20, 2009 the senate committee on health, education, labor and pensions were assigned the bill. Committee Stage of Bill The committee reviewed the text of the bill and there being no amendments they passed it to the next stage. Report Stage The family smoking prevention and tobacco control bill was further studied during the report stage by the members of the house of common in the committee and also those who were not in the committee passed the bill to the next stage there being no amendments. The Third Reading and adoption of Bill The members of the house of common came together to decide on whether the bill shou ld be adopted or not. They debated on the final form of the bill and its provisions. The bill provided for: i. Creation of a tobacco center within the FDA authority to regulate the content. ii. Marketing and sale of tobacco products. iii. It require the FDA approval for the use of expressions that indicate the capacity the product poses to health risk iv. limitation of advertisements that could attract young smokers Calls for new rules to prevent sales except through direct marketing. v. Face-to-face exchanges between a retailer and a consumer vi. The ban on flavoring applies to any product meeting the definition of a cigarette according to the Federal Cigarette Labeling and Advertising Act. This includes any tobacco that comes

Monday, July 22, 2019

A contrast between opposing values in Hard Times Essay Example for Free

A contrast between opposing values in Hard Times Essay The first incident that involves the circus and circus people that I would like to talk about, and that clearly demonstrates the contrast between opposing values is on page 34 onwards. Mr Gradgrind, the absolute pinnacle of fact in the book, goes to visit the Circus people to tell them that the fanciful Sissy Jupe can no longer attend the school. I have chosen this incident as it involves more of the circus characters than really at any other time, secondly the description of the circus shows just how far from the world and values of fact it is. The circus is the best symbol for representing the alternative to all that is fact in the book; the circus is seen as a world of mystery and wonder almost of magic and idea that completely goes against the idea of facts. Gradgrind and Bounderby go to see Sissys father only to find out he has abandoned his daughter, it is then that Mr Gradgrind decides on the possibility of taking Sissy to his own home, and educating her in the ways of fact from there. Mr Bounderby and Mr Gradgrind get together during this time and have a conference of opinions based upon the fact and laws they have always followed, Gradgrind being softer at heart but still the fact machine at this point wants to take Sissy home, but Gradgrind can be heard to be saying No. I say no. I advise you not. I say by no means. He does this as he is the metaphor for fact throughout the book and to take someone elses child on as your own and teach them the ways of fact, when she has been living the life of fancy for many years seems absurd to Bounderby. However, at the same time that Gradgrind is having a debate about the matter with Bounderby, the various members of Slearys company gradually gathered together from the upper regions The circus people are described in this chapter as being remarkable gentleness and childishness about these people, a special inaptitude for any kind of sharp practice, and an untiring readiness to help and pity one another, deserving often as much respect, and always as much generous construction, as the every-day virtues of any class of people in the world. Unlike the likes of Bounderby and Gradgrind, who cannot be described as emotional or passionate or anything of the sort just plain hard facts Sleary in this chapter is the real philosopher on the ideas of fancy he even says it I lay down the philothophy of the thubject when thay to you, Thquire, make the betht of uth: not the wurtht! This chapter clearly show the contrast between opposing views and values in Hard Times, the circus shows a whole new world but is representative of a whole new set of values the ideas of fancy are represented in the themes and scenes with the circus. The thing is with the circus is that it has almost dreamlike status things happen there that cannot happen anywhere else and it appears to be an almost illusion, for example The father of one of the families was in the habit of balancing the father of another of the families on top of a great pole. These are the things that you would only expect to see in dreams and so therefore it is fanciful, a complete contrast to the ideas of fact displayed throughout the rest of Hard Times. A good example of how far opposed to the ideas of fact the circus is takes place on page 12 and 13 when, Mr Gradgrind the keeper of facts and bringer of knowledge to Thomas and Louisa Gradgrinds lives, catches them sat watching the circus people, he takes the view that the circus was bad news, as it opposes everything he stands for Now to think of these vagabonds attracting the young rabble from a model school. He sees the idea of the circus so fanciful and alien to him, he feels that to watch a circus act would be to debase himself or a well-educated child. It even says his own mathematical Thomas abasing himself on the ground to catch but a hoof of the graceful equestrian Tyrolean flower act! This sentence shows what the opposing values are fully in Hard Times, it is obvious from the statement that anything mathematical or just plain practical is in direct opposition to the fanciful nature of flower shows and the like. Thomas when caught does not even protest but knows that to obey his fathers principles he must [give] himself up to be taken home like a machine. That is clearly the way of fact to be machine like, and that is why the circus is such a good opposition and symbol of everything fact isnt, Gradgrind condemns circus like ideals when he says In the name of wonder, idleness and folly! apparently to dream or to be imaginative is lazy in Gradgrinds books. Which is why the factual way in which Gradgrind has based his life upon is so offended by the ideas of fancy as he doesnt like the thought of being considered as being not lazy but that there can be other ways to work hard in life. Gradgrind is so full of the idea that facts are right, that he even questions and believes that with all these thoughts at the disposal they could make the wrong decision, when surely it isnt a case of right and wrong? Just opposing views and they do oppose each other! Gradgrind does say though Thomas though I have the facts before me I find it difficult to believe that you with your education and resources should have brought your sister to a scene like this. This makes it seem as though education is supposed to kill the imagination, which clearly conflicts with the views of the circus, which believe that you should work hard and perform in life, but never let the dreams die.

Sunday, July 21, 2019

Data Pre-processing Tool

Data Pre-processing Tool Chapter- 2 Real life data rarely comply with the necessities of various data mining tools. It is usually inconsistent and noisy. It may contain redundant attributes, unsuitable formats etc. Hence data has to be prepared vigilantly before the data mining actually starts. It is well known fact that success of a data mining algorithm is very much dependent on the quality of data processing. Data processing is one of the most important tasks in data mining. In this context it is natural that data pre-processing is a complicated task involving large data sets. Sometimes data pre-processing take more than 50% of the total time spent in solving the data mining problem. It is crucial for data miners to choose efficient data preprocessing technique for specific data set which can not only save processing time but also retain the quality of the data for data mining process. A data pre-processing tool should help miners with many data mining activates. For example, data may be provided in different formats as discussed in previous chapter (flat files, database files etc). Data files may also have different formats of values, calculation of derived attributes, data filters, joined data sets etc. Data mining process generally starts with understanding of data. In this stage pre-processing tools may help with data exploration and data discovery tasks. Data processing includes lots of tedious works, Data pre-processing generally consists of Data Cleaning Data Integration Data Transformation And Data Reduction. In this chapter we will study all these data pre-processing activities. 2.1 Data Understanding In Data understanding phase the first task is to collect initial data and then proceed with activities in order to get well known with data, to discover data quality problems, to discover first insight into the data or to identify interesting subset to form hypothesis for hidden information. The data understanding phase according to CRISP model can be shown in following . 2.1.1 Collect Initial Data The initial collection of data includes loading of data if required for data understanding. For instance, if specific tool is applied for data understanding, it makes great sense to load your data into this tool. This attempt possibly leads to initial data preparation steps. However if data is obtained from multiple data sources then integration is an additional issue. 2.1.2 Describe data Here the gross or surface properties of the gathered data are examined. 2.1.3 Explore data This task is required to handle the data mining questions, which may be addressed using querying, visualization and reporting. These include: Sharing of key attributes, for instance the goal attribute of a prediction task Relations between pairs or small numbers of attributes Results of simple aggregations Properties of important sub-populations Simple statistical analyses. 2.1.4 Verify data quality In this step quality of data is examined. It answers questions such as: Is the data complete (does it cover all the cases required)? Is it accurate or does it contains errors and if there are errors how common are they? Are there missing values in the data? If so how are they represented, where do they occur and how common are they? 2.2 Data Preprocessing Data preprocessing phase focus on the pre-processing steps that produce the data to be mined. Data preparation or preprocessing is one most important step in data mining. Industrial practice indicates that one data is well prepared; the mined results are much more accurate. This means this step is also a very critical fro success of data mining method. Among others, data preparation mainly involves data cleaning, data integration, data transformation, and reduction. 2.2.1 Data Cleaning Data cleaning is also known as data cleansing or scrubbing. It deals with detecting and removing inconsistencies and errors from data in order to get better quality data. While using a single data source such as flat files or databases data quality problems arises due to misspellings while data entry, missing information or other invalid data. While the data is taken from the integration of multiple data sources such as data warehouses, federated database systems or global web-based information systems, the requirement for data cleaning increases significantly. This is because the multiple sources may contain redundant data in different formats. Consolidation of different data formats abs elimination of redundant information becomes necessary in order to provide access to accurate and consistent data. Good quality data requires passing a set of quality criteria. Those criteria include: Accuracy: Accuracy is an aggregated value over the criteria of integrity, consistency and density. Integrity: Integrity is an aggregated value over the criteria of completeness and validity. Completeness: completeness is achieved by correcting data containing anomalies. Validity: Validity is approximated by the amount of data satisfying integrity constraints. Consistency: consistency concerns contradictions and syntactical anomalies in data. Uniformity: it is directly related to irregularities in data. Density: The density is the quotient of missing values in the data and the number of total values ought to be known. Uniqueness: uniqueness is related to the number of duplicates present in the data. 2.2.1.1 Terms Related to Data Cleaning Data cleaning: data cleaning is the process of detecting, diagnosing, and editing damaged data. Data editing: data editing means changing the value of data which are incorrect. Data flow: data flow is defined as passing of recorded information through succeeding information carriers. Inliers: Inliers are data values falling inside the projected range. Outlier: outliers are data value falling outside the projected range. Robust estimation: evaluation of statistical parameters, using methods that are less responsive to the effect of outliers than more conventional methods are called robust method. 2.2.1.2 Definition: Data Cleaning Data cleaning is a process used to identify imprecise, incomplete, or irrational data and then improving the quality through correction of detected errors and omissions. This process may include format checks Completeness checks Reasonableness checks Limit checks Review of the data to identify outliers or other errors Assessment of data by subject area experts (e.g. taxonomic specialists). By this process suspected records are flagged, documented and checked subsequently. And finally these suspected records can be corrected. Sometimes validation checks also involve checking for compliance against applicable standards, rules, and conventions. The general framework for data cleaning given as: Define and determine error types; Search and identify error instances; Correct the errors; Document error instances and error types; and Modify data entry procedures to reduce future errors. Data cleaning process is referred by different people by a number of terms. It is a matter of preference what one uses. These terms include: Error Checking, Error Detection, Data Validation, Data Cleaning, Data Cleansing, Data Scrubbing and Error Correction. We use Data Cleaning to encompass three sub-processes, viz. Data checking and error detection; Data validation; and Error correction. A fourth improvement of the error prevention processes could perhaps be added. 2.2.1.3 Problems with Data Here we just note some key problems with data Missing data : This problem occur because of two main reasons Data are absent in source where it is expected to be present. Some times data is present are not available in appropriately form Detecting missing data is usually straightforward and simpler. Erroneous data: This problem occurs when a wrong value is recorded for a real world value. Detection of erroneous data can be quite difficult. (For instance the incorrect spelling of a name) Duplicated data : This problem occur because of two reasons Repeated entry of same real world entity with some different values Some times a real world entity may have different identifications. Repeat records are regular and frequently easy to detect. The different identification of the same real world entities can be a very hard problem to identify and solve. Heterogeneities: When data from different sources are brought together in one analysis problem heterogeneity may occur. Heterogeneity could be Structural heterogeneity arises when the data structures reflect different business usage Semantic heterogeneity arises when the meaning of data is different n each system that is being combined Heterogeneities are usually very difficult to resolve since because they usually involve a lot of contextual data that is not well defined as metadata. Information dependencies in the relationship between the different sets of attribute are commonly present. Wrong cleaning mechanisms can further damage the information in the data. Various analysis tools handle these problems in different ways. Commercial offerings are available that assist the cleaning process, but these are often problem specific. Uncertainty in information systems is a well-recognized hard problem. In following a very simple examples of missing and erroneous data is shown Extensive support for data cleaning must be provided by data warehouses. Data warehouses have high probability of â€Å"dirty data† since they load and continuously refresh huge amounts of data from a variety of sources. Since these data warehouses are used for strategic decision making therefore the correctness of their data is important to avoid wrong decisions. The ETL (Extraction, Transformation, and Loading) process for building a data warehouse is illustrated in following . Data transformations are related with schema or data translation and integration, and with filtering and aggregating data to be stored in the data warehouse. All data cleaning is classically performed in a separate data performance area prior to loading the transformed data into the warehouse. A large number of tools of varying functionality are available to support these tasks, but often a significant portion of the cleaning and transformation work has to be done manually or by low-level programs that are difficult to write and maintain. A data cleaning method should assure following: It should identify and eliminate all major errors and inconsistencies in an individual data sources and also when integrating multiple sources. Data cleaning should be supported by tools to bound manual examination and programming effort and it should be extensible so that can cover additional sources. It should be performed in association with schema related data transformations based on metadata. Data cleaning mapping functions should be specified in a declarative way and be reusable for other data sources. 2.2.1.4 Data Cleaning: Phases 1. Analysis: To identify errors and inconsistencies in the database there is a need of detailed analysis, which involves both manual inspection and automated analysis programs. This reveals where (most of) the problems are present. 2. Defining Transformation and Mapping Rules: After discovering the problems, this phase are related with defining the manner by which we are going to automate the solutions to clean the data. We will find various problems that translate to a list of activities as a result of analysis phase. Example: Remove all entries for J. Smith because they are duplicates of John Smith Find entries with `bule in colour field and change these to `blue. Find all records where the Phone number field does not match the pattern (NNNNN NNNNNN). Further steps for cleaning this data are then applied. Etc †¦ 3. Verification: In this phase we check and assess the transformation plans made in phase- 2. Without this step, we may end up making the data dirtier rather than cleaner. Since data transformation is the main step that actually changes the data itself so there is a need to be sure that the applied transformations will do it correctly. Therefore test and examine the transformation plans very carefully. Example: Let we have a very thick C++ book where it says strict in all the places where it should say struct 4. Transformation: Now if it is sure that cleaning will be done correctly, then apply the transformation verified in last step. For large database, this task is supported by a variety of tools Backflow of Cleaned Data: In a data mining the main objective is to convert and move clean data into target system. This asks for a requirement to purify legacy data. Cleansing can be a complicated process depending on the technique chosen and has to be designed carefully to achieve the objective of removal of dirty data. Some methods to accomplish the task of data cleansing of legacy system include: n Automated data cleansing n Manual data cleansing n The combined cleansing process 2.2.1.5 Missing Values Data cleaning addresses a variety of data quality problems, including noise and outliers, inconsistent data, duplicate data, and missing values. Missing values is one important problem to be addressed. Missing value problem occurs because many tuples may have no record for several attributes. For Example there is a customer sales database consisting of a whole bunch of records (lets say around 100,000) where some of the records have certain fields missing. Lets say customer income in sales data may be missing. Goal here is to find a way to predict what the missing data values should be (so that these can be filled) based on the existing data. Missing data may be due to following reasons Equipment malfunction Inconsistent with other recorded data and thus deleted Data not entered due to misunderstanding Certain data may not be considered important at the time of entry Not register history or changes of the data How to Handle Missing Values? Dealing with missing values is a regular question that has to do with the actual meaning of the data. There are various methods for handling missing entries 1. Ignore the data row. One solution of missing values is to just ignore the entire data row. This is generally done when the class label is not there (here we are assuming that the data mining goal is classification), or many attributes are missing from the row (not just one). But if the percentage of such rows is high we will definitely get a poor performance. 2. Use a global constant to fill in for missing values. We can fill in a global constant for missing values such as unknown, N/A or minus infinity. This is done because at times is just doesnt make sense to try and predict the missing value. For example if in customer sales database if, say, office address is missing for some, filling it in doesnt make much sense. This method is simple but is not full proof. 3. Use attribute mean. Let say if the average income of a a family is X you can use that value to replace missing income values in the customer sales database. 4. Use attribute mean for all samples belonging to the same class. Lets say you have a cars pricing DB that, among other things, classifies cars to Luxury and Low budget and youre dealing with missing values in the cost field. Replacing missing cost of a luxury car with the average cost of all luxury cars is probably more accurate then the value youd get if you factor in the low budget 5. Use data mining algorithm to predict the value. The value can be determined using regression, inference based tools using Bayesian formalism, decision trees, clustering algorithms etc. 2.2.1.6 Noisy Data Noise can be defined as a random error or variance in a measured variable. Due to randomness it is very difficult to follow a strategy for noise removal from the data. Real world data is not always faultless. It can suffer from corruption which may impact the interpretations of the data, models created from the data, and decisions made based on the data. Incorrect attribute values could be present because of following reasons Faulty data collection instruments Data entry problems Duplicate records Incomplete data: Inconsistent data Incorrect processing Data transmission problems Technology limitation. Inconsistency in naming convention Outliers How to handle Noisy Data? The methods for removing noise from data are as follows. 1. Binning: this approach first sort data and partition it into (equal-frequency) bins then one can smooth it using- Bin means, smooth using bin median, smooth using bin boundaries, etc. 2. Regression: in this method smoothing is done by fitting the data into regression functions. 3. Clustering: clustering detect and remove outliers from the data. 4. Combined computer and human inspection: in this approach computer detects suspicious values which are then checked by human experts (e.g., this approach deal with possible outliers).. Following methods are explained in detail as follows: Binning: Data preparation activity that converts continuous data to discrete data by replacing a value from a continuous range with a bin identifier, where each bin represents a range of values. For instance, age can be changed to bins such as 20 or under, 21-40, 41-65 and over 65. Binning methods smooth a sorted data set by consulting values around it. This is therefore called local smoothing. Let consider a binning example Binning Methods n Equal-width (distance) partitioning Divides the range into N intervals of equal size: uniform grid if A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B-A)/N. The most straightforward, but outliers may dominate presentation Skewed data is not handled well n Equal-depth (frequency) partitioning 1. It divides the range (values of a given attribute) into N intervals, each containing approximately same number of samples (elements) 2. Good data scaling 3. Managing categorical attributes can be tricky. n Smooth by bin means- Each bin value is replaced by the mean of values n Smooth by bin medians- Each bin value is replaced by the median of values n Smooth by bin boundaries Each bin value is replaced by the closest boundary value Example Let Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 n Partition into equal-frequency (equi-depth) bins: o Bin 1: 4, 8, 9, 15 o Bin 2: 21, 21, 24, 25 o Bin 3: 26, 28, 29, 34 n Smoothing by bin means: o Bin 1: 9, 9, 9, 9 ( for example mean of 4, 8, 9, 15 is 9) o Bin 2: 23, 23, 23, 23 o Bin 3: 29, 29, 29, 29 n Smoothing by bin boundaries: o Bin 1: 4, 4, 4, 15 o Bin 2: 21, 21, 25, 25 o Bin 3: 26, 26, 26, 34 Regression: Regression is a DM technique used to fit an equation to a dataset. The simplest form of regression is linear regression which uses the formula of a straight line (y = b+ wx) and determines the suitable values for b and w to predict the value of y based upon a given value of x. Sophisticated techniques, such as multiple regression, permit the use of more than one input variable and allow for the fitting of more complex models, such as a quadratic equation. Regression is further described in subsequent chapter while discussing predictions. Clustering: clustering is a method of grouping data into different groups , so that data in each group share similar trends and patterns. Clustering constitute a major class of data mining algorithms. These algorithms automatically partitions the data space into set of regions or cluster. The goal of the process is to find all set of similar examples in data, in some optimal fashion. Following shows three clusters. Values that fall outsid e the cluster are outliers. 4. Combined computer and human inspection: These methods find the suspicious values using the computer programs and then they are verified by human experts. By this process all outliers are checked. 2.2.1.7 Data cleaning as a process Data cleaning is the process of Detecting, Diagnosing, and Editing Data. Data cleaning is a three stage method involving repeated cycle of screening, diagnosing, and editing of suspected data abnormalities. Many data errors are detected by the way during study activities. However, it is more efficient to discover inconsistencies by actively searching for them in a planned manner. It is not always right away clear whether a data point is erroneous. Many times it requires careful examination. Likewise, missing values require additional check. Therefore, predefined rules for dealing with errors and true missing and extreme values are part of good practice. One can monitor for suspect features in survey questionnaires, databases, or analysis data. In small studies, with the examiner intimately involved at all stages, there may be small or no difference between a database and an analysis dataset. During as well as after treatment, the diagnostic and treatment phases of cleaning need insight into the sources and types of errors at all stages of the study. Data flow concept is therefore crucial in this respect. After measurement the research data go through repeated steps of- entering into information carriers, extracted, and transferred to other carriers, edited, selected, transformed, summarized, and presented. It is essential to understand that errors can occur at any stage of the data flow, including during data cleaning itself. Most of these problems are due to human error. Inaccuracy of a single data point and measurement may be tolerable, and associated to the inherent technological error of the measurement device. Therefore the process of data clenaning mus focus on those errors that are beyond small technical variations and that form a major shift within or beyond the population distribution. In turn, it must be based on understanding of technical errors and expected ranges of normal values. Some errors are worthy of higher priority, but which ones are most significant is highly study-specific. For instance in most medical epidemiological studies, errors that need to be cleaned, at all costs, include missing gender, gender misspecification, birth date or examination date errors, duplications or merging of records, and biologically impossible results. Another example is in nutrition studies, date errors lead to age errors, which in turn lead to errors in weight-for-age scoring and, further, to misclassification of subjects as under- or overweight. Errors of sex and date are particularly important because they contaminate derived variables. Prioritization is essential if the study is under time pressures or if resources for data cleaning are limited. 2.2.2 Data Integration This is a process of taking data from one or more sources and mapping it, field by field, onto a new data structure. Idea is to combine data from multiple sources into a coherent form. Various data mining projects requires data from multiple sources because n Data may be distributed over different databases or data warehouses. (for example an epidemiological study that needs information about hospital admissions and car accidents) n Sometimes data may be required from different geographic distributions, or there may be need for historical data. (e.g. integrate historical data into a new data warehouse) n There may be a necessity of enhancement of data with additional (external) data. (for improving data mining precision) 2.2.2.1 Data Integration Issues There are number of issues in data integrations. Consider two database tables. Imagine two database tables Database Table-1 Database Table-2 In integration of there two tables there are variety of issues involved such as 1. The same attribute may have different names (for example in above tables Name and Given Name are same attributes with different names) 2. An attribute may be derived from another (for example attribute Age is derived from attribute DOB) 3. Attributes might be redundant( For example attribute PID is redundant) 4. Values in attributes might be different (for example for PID 4791 values in second and third field are different in both the tables) 5. Duplicate records under different keys( there is a possibility of replication of same record with different key values) Therefore schema integration and object matching can be trickier. Question here is how equivalent entities from different sources are matched? This problem is known as entity identification problem. Conflicts have to be detected and resolved. Integration becomes easier if unique entity keys are available in all the data sets (or tables) to be linked. Metadata can help in schema integration (example of metadata for each attribute includes the name, meaning, data type and range of values permitted for the attribute) 2.2.2.1 Redundancy Redundancy is another important issue in data integration. Two given attribute (such as DOB and age for instance in give table) may be redundant if one is derived form the other attribute or set of attributes. Inconsistencies in attribute or dimension naming can lead to redundancies in the given data sets. Handling Redundant Data We can handle data redundancy problems by following ways n Use correlation analysis n Different coding / representation has to be considered (e.g. metric / imperial measures) n Careful (manual) integration of the data can reduce or prevent redundancies (and inconsistencies) n De-duplication (also called internal data linkage) o If no unique entity keys are available o Analysis of values in attributes to find duplicates n Process redundant and inconsistent data (easy if values are the same) o Delete one of the values o Average values (only for numerical attributes) o Take majority values (if more than 2 duplicates and some values are the same) Correlation analysis is explained in detail here. Correlation analysis (also called Pearsons product moment coefficient): some redundancies can be detected by using correlation analysis. Given two attributes, such analysis can measure how strong one attribute implies another. For numerical attribute we can compute correlation coefficient of two attributes A and B to evaluate the correlation between them. This is given by Where n n is the number of tuples, n and are the respective means of A and B n ÏÆ'A and ÏÆ'B are the respective standard deviation of A and B n ÃŽ £(AB) is the sum of the AB cross-product. a. If -1 b. If rA, B is equal to zero it indicates A and B are independent of each other and there is no correlation between them. c. If rA, B is less than zero then A and B are negatively correlated. , where if value of one attribute increases value of another attribute decreases. This means that one attribute discourages another attribute. It is important to note that correlation does not imply causality. That is, if A and B are correlated, this does not essentially mean that A causes B or that B causes A. for example in analyzing a demographic database, we may find that attribute representing number of accidents and the number of car theft in a region are correlated. This does not mean that one is related to another. Both may be related to third attribute, namely population. For discrete data, a correlation relation between two attributes, can be discovered by a χ ²(chi-square) test. Let A has c distinct values a1,a2,†¦Ã¢â‚¬ ¦ac and B has r different values namely b1,b2,†¦Ã¢â‚¬ ¦br The data tuple described by A and B are shown as contingency table, with c values of A (making up columns) and r values of B( making up rows). Each and every (Ai, Bj) cell in table has. X^2 = sum_{i=1}^{r} sum_{j=1}^{c} {(O_{i,j} E_{i,j})^2 over E_{i,j}} . Where n Oi, j is the observed frequency (i.e. actual count) of joint event (Ai, Bj) and n Ei, j is the expected frequency which can be computed as E_{i,j}=frac{sum_{k=1}^{c} O_{i,k} sum_{k=1}^{r} O_{k,j}}{N} , , Where n N is number of data tuple n Oi,k is number of tuples having value ai for A n Ok,j is number of tuples having value bj for B The larger the χ ² value, the more likely the variables are related. The cells that contribute the most to the χ ² value are those whose actual count is very different from the expected count Chi-Square Calculation: An Example Suppose a group of 1,500 people were surveyed. The gender of each person was noted. Each person has polled their preferred type of reading material as fiction or non-fiction. The observed frequency of each possible joint event is summarized in following table.( number in parenthesis are expected frequencies) . Calculate chi square. Play chess Not play chess Sum (row) Like science fiction 250(90) 200(360) 450 Not like science fiction 50(210) 1000(840) 1050 Sum(col.) 300 1200 1500 E11 = count (male)*count(fiction)/N = 300 * 450 / 1500 =90 and so on For this table the degree of freedom are (2-1)(2-1) =1 as table is 2X2. for 1 degree of freedom , the χ ² value needed to reject the hypothesis at the 0.001 significance level is 10.828 (taken from the table of upper percentage point of the χ ² distribution typically available in any statistic text book). Since the computed value is above this, we can reject the hypothesis that gender and preferred reading are independent and conclude that two attributes are strongly correlated for given group. Duplication must also be detected at the tuple level. The use of renormalized tables is also a source of redundancies. Redundancies may further lead to data inconsistencies (due to updating some but not others). 2.2.2.2 Detection and resolution of data value conflicts Another significant issue in data integration is the discovery and resolution of data value conflicts. For example, for the same entity, attribute values from different sources may differ. For example weight can be stored in metric unit in one source and British imperial unit in another source. For instance, for a hotel cha Data Pre-processing Tool Data Pre-processing Tool Chapter- 2 Real life data rarely comply with the necessities of various data mining tools. It is usually inconsistent and noisy. It may contain redundant attributes, unsuitable formats etc. Hence data has to be prepared vigilantly before the data mining actually starts. It is well known fact that success of a data mining algorithm is very much dependent on the quality of data processing. Data processing is one of the most important tasks in data mining. In this context it is natural that data pre-processing is a complicated task involving large data sets. Sometimes data pre-processing take more than 50% of the total time spent in solving the data mining problem. It is crucial for data miners to choose efficient data preprocessing technique for specific data set which can not only save processing time but also retain the quality of the data for data mining process. A data pre-processing tool should help miners with many data mining activates. For example, data may be provided in different formats as discussed in previous chapter (flat files, database files etc). Data files may also have different formats of values, calculation of derived attributes, data filters, joined data sets etc. Data mining process generally starts with understanding of data. In this stage pre-processing tools may help with data exploration and data discovery tasks. Data processing includes lots of tedious works, Data pre-processing generally consists of Data Cleaning Data Integration Data Transformation And Data Reduction. In this chapter we will study all these data pre-processing activities. 2.1 Data Understanding In Data understanding phase the first task is to collect initial data and then proceed with activities in order to get well known with data, to discover data quality problems, to discover first insight into the data or to identify interesting subset to form hypothesis for hidden information. The data understanding phase according to CRISP model can be shown in following . 2.1.1 Collect Initial Data The initial collection of data includes loading of data if required for data understanding. For instance, if specific tool is applied for data understanding, it makes great sense to load your data into this tool. This attempt possibly leads to initial data preparation steps. However if data is obtained from multiple data sources then integration is an additional issue. 2.1.2 Describe data Here the gross or surface properties of the gathered data are examined. 2.1.3 Explore data This task is required to handle the data mining questions, which may be addressed using querying, visualization and reporting. These include: Sharing of key attributes, for instance the goal attribute of a prediction task Relations between pairs or small numbers of attributes Results of simple aggregations Properties of important sub-populations Simple statistical analyses. 2.1.4 Verify data quality In this step quality of data is examined. It answers questions such as: Is the data complete (does it cover all the cases required)? Is it accurate or does it contains errors and if there are errors how common are they? Are there missing values in the data? If so how are they represented, where do they occur and how common are they? 2.2 Data Preprocessing Data preprocessing phase focus on the pre-processing steps that produce the data to be mined. Data preparation or preprocessing is one most important step in data mining. Industrial practice indicates that one data is well prepared; the mined results are much more accurate. This means this step is also a very critical fro success of data mining method. Among others, data preparation mainly involves data cleaning, data integration, data transformation, and reduction. 2.2.1 Data Cleaning Data cleaning is also known as data cleansing or scrubbing. It deals with detecting and removing inconsistencies and errors from data in order to get better quality data. While using a single data source such as flat files or databases data quality problems arises due to misspellings while data entry, missing information or other invalid data. While the data is taken from the integration of multiple data sources such as data warehouses, federated database systems or global web-based information systems, the requirement for data cleaning increases significantly. This is because the multiple sources may contain redundant data in different formats. Consolidation of different data formats abs elimination of redundant information becomes necessary in order to provide access to accurate and consistent data. Good quality data requires passing a set of quality criteria. Those criteria include: Accuracy: Accuracy is an aggregated value over the criteria of integrity, consistency and density. Integrity: Integrity is an aggregated value over the criteria of completeness and validity. Completeness: completeness is achieved by correcting data containing anomalies. Validity: Validity is approximated by the amount of data satisfying integrity constraints. Consistency: consistency concerns contradictions and syntactical anomalies in data. Uniformity: it is directly related to irregularities in data. Density: The density is the quotient of missing values in the data and the number of total values ought to be known. Uniqueness: uniqueness is related to the number of duplicates present in the data. 2.2.1.1 Terms Related to Data Cleaning Data cleaning: data cleaning is the process of detecting, diagnosing, and editing damaged data. Data editing: data editing means changing the value of data which are incorrect. Data flow: data flow is defined as passing of recorded information through succeeding information carriers. Inliers: Inliers are data values falling inside the projected range. Outlier: outliers are data value falling outside the projected range. Robust estimation: evaluation of statistical parameters, using methods that are less responsive to the effect of outliers than more conventional methods are called robust method. 2.2.1.2 Definition: Data Cleaning Data cleaning is a process used to identify imprecise, incomplete, or irrational data and then improving the quality through correction of detected errors and omissions. This process may include format checks Completeness checks Reasonableness checks Limit checks Review of the data to identify outliers or other errors Assessment of data by subject area experts (e.g. taxonomic specialists). By this process suspected records are flagged, documented and checked subsequently. And finally these suspected records can be corrected. Sometimes validation checks also involve checking for compliance against applicable standards, rules, and conventions. The general framework for data cleaning given as: Define and determine error types; Search and identify error instances; Correct the errors; Document error instances and error types; and Modify data entry procedures to reduce future errors. Data cleaning process is referred by different people by a number of terms. It is a matter of preference what one uses. These terms include: Error Checking, Error Detection, Data Validation, Data Cleaning, Data Cleansing, Data Scrubbing and Error Correction. We use Data Cleaning to encompass three sub-processes, viz. Data checking and error detection; Data validation; and Error correction. A fourth improvement of the error prevention processes could perhaps be added. 2.2.1.3 Problems with Data Here we just note some key problems with data Missing data : This problem occur because of two main reasons Data are absent in source where it is expected to be present. Some times data is present are not available in appropriately form Detecting missing data is usually straightforward and simpler. Erroneous data: This problem occurs when a wrong value is recorded for a real world value. Detection of erroneous data can be quite difficult. (For instance the incorrect spelling of a name) Duplicated data : This problem occur because of two reasons Repeated entry of same real world entity with some different values Some times a real world entity may have different identifications. Repeat records are regular and frequently easy to detect. The different identification of the same real world entities can be a very hard problem to identify and solve. Heterogeneities: When data from different sources are brought together in one analysis problem heterogeneity may occur. Heterogeneity could be Structural heterogeneity arises when the data structures reflect different business usage Semantic heterogeneity arises when the meaning of data is different n each system that is being combined Heterogeneities are usually very difficult to resolve since because they usually involve a lot of contextual data that is not well defined as metadata. Information dependencies in the relationship between the different sets of attribute are commonly present. Wrong cleaning mechanisms can further damage the information in the data. Various analysis tools handle these problems in different ways. Commercial offerings are available that assist the cleaning process, but these are often problem specific. Uncertainty in information systems is a well-recognized hard problem. In following a very simple examples of missing and erroneous data is shown Extensive support for data cleaning must be provided by data warehouses. Data warehouses have high probability of â€Å"dirty data† since they load and continuously refresh huge amounts of data from a variety of sources. Since these data warehouses are used for strategic decision making therefore the correctness of their data is important to avoid wrong decisions. The ETL (Extraction, Transformation, and Loading) process for building a data warehouse is illustrated in following . Data transformations are related with schema or data translation and integration, and with filtering and aggregating data to be stored in the data warehouse. All data cleaning is classically performed in a separate data performance area prior to loading the transformed data into the warehouse. A large number of tools of varying functionality are available to support these tasks, but often a significant portion of the cleaning and transformation work has to be done manually or by low-level programs that are difficult to write and maintain. A data cleaning method should assure following: It should identify and eliminate all major errors and inconsistencies in an individual data sources and also when integrating multiple sources. Data cleaning should be supported by tools to bound manual examination and programming effort and it should be extensible so that can cover additional sources. It should be performed in association with schema related data transformations based on metadata. Data cleaning mapping functions should be specified in a declarative way and be reusable for other data sources. 2.2.1.4 Data Cleaning: Phases 1. Analysis: To identify errors and inconsistencies in the database there is a need of detailed analysis, which involves both manual inspection and automated analysis programs. This reveals where (most of) the problems are present. 2. Defining Transformation and Mapping Rules: After discovering the problems, this phase are related with defining the manner by which we are going to automate the solutions to clean the data. We will find various problems that translate to a list of activities as a result of analysis phase. Example: Remove all entries for J. Smith because they are duplicates of John Smith Find entries with `bule in colour field and change these to `blue. Find all records where the Phone number field does not match the pattern (NNNNN NNNNNN). Further steps for cleaning this data are then applied. Etc †¦ 3. Verification: In this phase we check and assess the transformation plans made in phase- 2. Without this step, we may end up making the data dirtier rather than cleaner. Since data transformation is the main step that actually changes the data itself so there is a need to be sure that the applied transformations will do it correctly. Therefore test and examine the transformation plans very carefully. Example: Let we have a very thick C++ book where it says strict in all the places where it should say struct 4. Transformation: Now if it is sure that cleaning will be done correctly, then apply the transformation verified in last step. For large database, this task is supported by a variety of tools Backflow of Cleaned Data: In a data mining the main objective is to convert and move clean data into target system. This asks for a requirement to purify legacy data. Cleansing can be a complicated process depending on the technique chosen and has to be designed carefully to achieve the objective of removal of dirty data. Some methods to accomplish the task of data cleansing of legacy system include: n Automated data cleansing n Manual data cleansing n The combined cleansing process 2.2.1.5 Missing Values Data cleaning addresses a variety of data quality problems, including noise and outliers, inconsistent data, duplicate data, and missing values. Missing values is one important problem to be addressed. Missing value problem occurs because many tuples may have no record for several attributes. For Example there is a customer sales database consisting of a whole bunch of records (lets say around 100,000) where some of the records have certain fields missing. Lets say customer income in sales data may be missing. Goal here is to find a way to predict what the missing data values should be (so that these can be filled) based on the existing data. Missing data may be due to following reasons Equipment malfunction Inconsistent with other recorded data and thus deleted Data not entered due to misunderstanding Certain data may not be considered important at the time of entry Not register history or changes of the data How to Handle Missing Values? Dealing with missing values is a regular question that has to do with the actual meaning of the data. There are various methods for handling missing entries 1. Ignore the data row. One solution of missing values is to just ignore the entire data row. This is generally done when the class label is not there (here we are assuming that the data mining goal is classification), or many attributes are missing from the row (not just one). But if the percentage of such rows is high we will definitely get a poor performance. 2. Use a global constant to fill in for missing values. We can fill in a global constant for missing values such as unknown, N/A or minus infinity. This is done because at times is just doesnt make sense to try and predict the missing value. For example if in customer sales database if, say, office address is missing for some, filling it in doesnt make much sense. This method is simple but is not full proof. 3. Use attribute mean. Let say if the average income of a a family is X you can use that value to replace missing income values in the customer sales database. 4. Use attribute mean for all samples belonging to the same class. Lets say you have a cars pricing DB that, among other things, classifies cars to Luxury and Low budget and youre dealing with missing values in the cost field. Replacing missing cost of a luxury car with the average cost of all luxury cars is probably more accurate then the value youd get if you factor in the low budget 5. Use data mining algorithm to predict the value. The value can be determined using regression, inference based tools using Bayesian formalism, decision trees, clustering algorithms etc. 2.2.1.6 Noisy Data Noise can be defined as a random error or variance in a measured variable. Due to randomness it is very difficult to follow a strategy for noise removal from the data. Real world data is not always faultless. It can suffer from corruption which may impact the interpretations of the data, models created from the data, and decisions made based on the data. Incorrect attribute values could be present because of following reasons Faulty data collection instruments Data entry problems Duplicate records Incomplete data: Inconsistent data Incorrect processing Data transmission problems Technology limitation. Inconsistency in naming convention Outliers How to handle Noisy Data? The methods for removing noise from data are as follows. 1. Binning: this approach first sort data and partition it into (equal-frequency) bins then one can smooth it using- Bin means, smooth using bin median, smooth using bin boundaries, etc. 2. Regression: in this method smoothing is done by fitting the data into regression functions. 3. Clustering: clustering detect and remove outliers from the data. 4. Combined computer and human inspection: in this approach computer detects suspicious values which are then checked by human experts (e.g., this approach deal with possible outliers).. Following methods are explained in detail as follows: Binning: Data preparation activity that converts continuous data to discrete data by replacing a value from a continuous range with a bin identifier, where each bin represents a range of values. For instance, age can be changed to bins such as 20 or under, 21-40, 41-65 and over 65. Binning methods smooth a sorted data set by consulting values around it. This is therefore called local smoothing. Let consider a binning example Binning Methods n Equal-width (distance) partitioning Divides the range into N intervals of equal size: uniform grid if A and B are the lowest and highest values of the attribute, the width of intervals will be: W = (B-A)/N. The most straightforward, but outliers may dominate presentation Skewed data is not handled well n Equal-depth (frequency) partitioning 1. It divides the range (values of a given attribute) into N intervals, each containing approximately same number of samples (elements) 2. Good data scaling 3. Managing categorical attributes can be tricky. n Smooth by bin means- Each bin value is replaced by the mean of values n Smooth by bin medians- Each bin value is replaced by the median of values n Smooth by bin boundaries Each bin value is replaced by the closest boundary value Example Let Sorted data for price (in dollars): 4, 8, 9, 15, 21, 21, 24, 25, 26, 28, 29, 34 n Partition into equal-frequency (equi-depth) bins: o Bin 1: 4, 8, 9, 15 o Bin 2: 21, 21, 24, 25 o Bin 3: 26, 28, 29, 34 n Smoothing by bin means: o Bin 1: 9, 9, 9, 9 ( for example mean of 4, 8, 9, 15 is 9) o Bin 2: 23, 23, 23, 23 o Bin 3: 29, 29, 29, 29 n Smoothing by bin boundaries: o Bin 1: 4, 4, 4, 15 o Bin 2: 21, 21, 25, 25 o Bin 3: 26, 26, 26, 34 Regression: Regression is a DM technique used to fit an equation to a dataset. The simplest form of regression is linear regression which uses the formula of a straight line (y = b+ wx) and determines the suitable values for b and w to predict the value of y based upon a given value of x. Sophisticated techniques, such as multiple regression, permit the use of more than one input variable and allow for the fitting of more complex models, such as a quadratic equation. Regression is further described in subsequent chapter while discussing predictions. Clustering: clustering is a method of grouping data into different groups , so that data in each group share similar trends and patterns. Clustering constitute a major class of data mining algorithms. These algorithms automatically partitions the data space into set of regions or cluster. The goal of the process is to find all set of similar examples in data, in some optimal fashion. Following shows three clusters. Values that fall outsid e the cluster are outliers. 4. Combined computer and human inspection: These methods find the suspicious values using the computer programs and then they are verified by human experts. By this process all outliers are checked. 2.2.1.7 Data cleaning as a process Data cleaning is the process of Detecting, Diagnosing, and Editing Data. Data cleaning is a three stage method involving repeated cycle of screening, diagnosing, and editing of suspected data abnormalities. Many data errors are detected by the way during study activities. However, it is more efficient to discover inconsistencies by actively searching for them in a planned manner. It is not always right away clear whether a data point is erroneous. Many times it requires careful examination. Likewise, missing values require additional check. Therefore, predefined rules for dealing with errors and true missing and extreme values are part of good practice. One can monitor for suspect features in survey questionnaires, databases, or analysis data. In small studies, with the examiner intimately involved at all stages, there may be small or no difference between a database and an analysis dataset. During as well as after treatment, the diagnostic and treatment phases of cleaning need insight into the sources and types of errors at all stages of the study. Data flow concept is therefore crucial in this respect. After measurement the research data go through repeated steps of- entering into information carriers, extracted, and transferred to other carriers, edited, selected, transformed, summarized, and presented. It is essential to understand that errors can occur at any stage of the data flow, including during data cleaning itself. Most of these problems are due to human error. Inaccuracy of a single data point and measurement may be tolerable, and associated to the inherent technological error of the measurement device. Therefore the process of data clenaning mus focus on those errors that are beyond small technical variations and that form a major shift within or beyond the population distribution. In turn, it must be based on understanding of technical errors and expected ranges of normal values. Some errors are worthy of higher priority, but which ones are most significant is highly study-specific. For instance in most medical epidemiological studies, errors that need to be cleaned, at all costs, include missing gender, gender misspecification, birth date or examination date errors, duplications or merging of records, and biologically impossible results. Another example is in nutrition studies, date errors lead to age errors, which in turn lead to errors in weight-for-age scoring and, further, to misclassification of subjects as under- or overweight. Errors of sex and date are particularly important because they contaminate derived variables. Prioritization is essential if the study is under time pressures or if resources for data cleaning are limited. 2.2.2 Data Integration This is a process of taking data from one or more sources and mapping it, field by field, onto a new data structure. Idea is to combine data from multiple sources into a coherent form. Various data mining projects requires data from multiple sources because n Data may be distributed over different databases or data warehouses. (for example an epidemiological study that needs information about hospital admissions and car accidents) n Sometimes data may be required from different geographic distributions, or there may be need for historical data. (e.g. integrate historical data into a new data warehouse) n There may be a necessity of enhancement of data with additional (external) data. (for improving data mining precision) 2.2.2.1 Data Integration Issues There are number of issues in data integrations. Consider two database tables. Imagine two database tables Database Table-1 Database Table-2 In integration of there two tables there are variety of issues involved such as 1. The same attribute may have different names (for example in above tables Name and Given Name are same attributes with different names) 2. An attribute may be derived from another (for example attribute Age is derived from attribute DOB) 3. Attributes might be redundant( For example attribute PID is redundant) 4. Values in attributes might be different (for example for PID 4791 values in second and third field are different in both the tables) 5. Duplicate records under different keys( there is a possibility of replication of same record with different key values) Therefore schema integration and object matching can be trickier. Question here is how equivalent entities from different sources are matched? This problem is known as entity identification problem. Conflicts have to be detected and resolved. Integration becomes easier if unique entity keys are available in all the data sets (or tables) to be linked. Metadata can help in schema integration (example of metadata for each attribute includes the name, meaning, data type and range of values permitted for the attribute) 2.2.2.1 Redundancy Redundancy is another important issue in data integration. Two given attribute (such as DOB and age for instance in give table) may be redundant if one is derived form the other attribute or set of attributes. Inconsistencies in attribute or dimension naming can lead to redundancies in the given data sets. Handling Redundant Data We can handle data redundancy problems by following ways n Use correlation analysis n Different coding / representation has to be considered (e.g. metric / imperial measures) n Careful (manual) integration of the data can reduce or prevent redundancies (and inconsistencies) n De-duplication (also called internal data linkage) o If no unique entity keys are available o Analysis of values in attributes to find duplicates n Process redundant and inconsistent data (easy if values are the same) o Delete one of the values o Average values (only for numerical attributes) o Take majority values (if more than 2 duplicates and some values are the same) Correlation analysis is explained in detail here. Correlation analysis (also called Pearsons product moment coefficient): some redundancies can be detected by using correlation analysis. Given two attributes, such analysis can measure how strong one attribute implies another. For numerical attribute we can compute correlation coefficient of two attributes A and B to evaluate the correlation between them. This is given by Where n n is the number of tuples, n and are the respective means of A and B n ÏÆ'A and ÏÆ'B are the respective standard deviation of A and B n ÃŽ £(AB) is the sum of the AB cross-product. a. If -1 b. If rA, B is equal to zero it indicates A and B are independent of each other and there is no correlation between them. c. If rA, B is less than zero then A and B are negatively correlated. , where if value of one attribute increases value of another attribute decreases. This means that one attribute discourages another attribute. It is important to note that correlation does not imply causality. That is, if A and B are correlated, this does not essentially mean that A causes B or that B causes A. for example in analyzing a demographic database, we may find that attribute representing number of accidents and the number of car theft in a region are correlated. This does not mean that one is related to another. Both may be related to third attribute, namely population. For discrete data, a correlation relation between two attributes, can be discovered by a χ ²(chi-square) test. Let A has c distinct values a1,a2,†¦Ã¢â‚¬ ¦ac and B has r different values namely b1,b2,†¦Ã¢â‚¬ ¦br The data tuple described by A and B are shown as contingency table, with c values of A (making up columns) and r values of B( making up rows). Each and every (Ai, Bj) cell in table has. X^2 = sum_{i=1}^{r} sum_{j=1}^{c} {(O_{i,j} E_{i,j})^2 over E_{i,j}} . Where n Oi, j is the observed frequency (i.e. actual count) of joint event (Ai, Bj) and n Ei, j is the expected frequency which can be computed as E_{i,j}=frac{sum_{k=1}^{c} O_{i,k} sum_{k=1}^{r} O_{k,j}}{N} , , Where n N is number of data tuple n Oi,k is number of tuples having value ai for A n Ok,j is number of tuples having value bj for B The larger the χ ² value, the more likely the variables are related. The cells that contribute the most to the χ ² value are those whose actual count is very different from the expected count Chi-Square Calculation: An Example Suppose a group of 1,500 people were surveyed. The gender of each person was noted. Each person has polled their preferred type of reading material as fiction or non-fiction. The observed frequency of each possible joint event is summarized in following table.( number in parenthesis are expected frequencies) . Calculate chi square. Play chess Not play chess Sum (row) Like science fiction 250(90) 200(360) 450 Not like science fiction 50(210) 1000(840) 1050 Sum(col.) 300 1200 1500 E11 = count (male)*count(fiction)/N = 300 * 450 / 1500 =90 and so on For this table the degree of freedom are (2-1)(2-1) =1 as table is 2X2. for 1 degree of freedom , the χ ² value needed to reject the hypothesis at the 0.001 significance level is 10.828 (taken from the table of upper percentage point of the χ ² distribution typically available in any statistic text book). Since the computed value is above this, we can reject the hypothesis that gender and preferred reading are independent and conclude that two attributes are strongly correlated for given group. Duplication must also be detected at the tuple level. The use of renormalized tables is also a source of redundancies. Redundancies may further lead to data inconsistencies (due to updating some but not others). 2.2.2.2 Detection and resolution of data value conflicts Another significant issue in data integration is the discovery and resolution of data value conflicts. For example, for the same entity, attribute values from different sources may differ. For example weight can be stored in metric unit in one source and British imperial unit in another source. For instance, for a hotel cha