Friday, May 28, 2010

DATA WARE HOUSING


DATA WARE HOUSING
A data warehouse is a store of information organized in a unified data model.
A data warehouse is a relational database that is designed for query and analysis rather than for transaction processing. It usually contains historical data derived from transaction data, but it can include data from other sources. It separates analysis workload from transaction workload and enables an organization to consolidate data from several sources.
In this paper we tried to explain the characteristics, architectures and processes involved in data warehousing.
Characteristics of a data warehouse are
• Subject oriented
• Integrated
• Non volatile
• Time variant
Common architectures involved in data warehousing are of three types.Data warehouses and their architectures vary depending upon the specifics of an organization's situation. Three common architectures are:
• Data warehouse Architecture (Basic)
• Data warehouse Architecture (With a staging area)
• Data warehouse Architecture (With a staging area and Data Marts)
Data warehousing involves data pre-processing. All those methods are explained in this paper. Thus this paper includes gives the overview of data warehousing concepts.
Introduction
A data warehouse is a collection of data gathered and organized so that it can easily by analyzed, extracted, synthesized, and otherwise be used for the purposes of further understanding the data. It may be contrasted with data that is gathered to meet immediate business objectives such as order and payment transactions, although this data would also usually become part of a data warehouse.
A data warehouse is, primarily, a record of an enterprise's past transactional and operational information, stored in a database designed to favour efficient data analysis and reporting (especially OLAP). Data warehousing is not meant for current, "live" data.
Characteristics of a data warehouse:
• Subject oriented
• Integrated
• Non volatile
• Time variant
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more about your company's sales data, you can build a warehouse that concentrates on sales. Using this warehouse, you can answer questions like "Who was our best customer for this item last year?"
This ability to define a data warehouse by subject matter, sales in this case, makes the data warehouse subject oriented.
Integrated
Integration is closely related to subject orientation. Data warehouses must put data from disparate sources into a consistent format. They must resolve such problems as naming conflicts and inconsistencies among units of measure. When they achieve this, they are said to be integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This is logical because the purpose of a warehouse is to enable you to analyze what has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very much in contrast to online transaction processing (OLTP) systems, where performance requirements demand that historical data be moved to an archive. A data warehouse's focus on change over time is what is meant by the term time variant
Data warehouse architecture:
Data warehouse architecture is a description of the elements and services of the warehouse, with details showing how the components will fit together and how the system will grow over time. Data warehouse Architecture (Basic)
• Data warehouse Architecture (With a staging area)
• Data warehouse Architecture (With a staging area and Data Marts)

Data from the operational systems are
• Extracted
• Cleansed
• Transformed
• Loaded & Propagated
Extraction
Data Warehouse Manager supports standard SQL-style inner, outer and exception file joins, as well as joins based on derived fields, one-to-many relationships, file members, unique key fields, and file unions. You may extract data from both local and distributed databases when OS/400 Distributed Data Base Management (DDM) is in use.
Record selection and new field calculation criteria may be defined in the extraction object or left for ad hoc entry at run-time. CL programs may also be used to pass ad hoc criteria into the extraction object. NGS provides sample CL source programs to illustrate this capability. Data Warehouse Manager also supports full Boolean selection logic and a timesaving
Transformation
The data transformation capabilities of Data Warehouse Manager convert numeric codes for inventory items, employees, departments and other business terms into meaningful, descriptive values. You can also convert dates into consistent formats and generate summary fields such as gross-to-net, gross margin and more. Other transformation functions allow you to create fields representing items such as integer keys, elapsed days and time, absolute values, remainders, etc
Also convert dates into consistent formats and generate summary fields such as gross-to-net, gross margin and more. Other transformation functions allow you to create fields representing items such as integer keys, elapsed days and time, absolute values, remainders, etc.
Cleansing
Data Warehouse Manager can handle many data cleansing requirements and provide Application Program Interfaces (API’s) for custom cleansing programs or third-party tools copy the entry of complex selection criteria.
Loading and Propagation
Data Warehouse Manager builds DB2 UDB for I-Series 400 database files, which may be accessed by numerous IBM and third-party software tools.
These files may be generated in summary or detail as required. Key fields may be assigned to the designated support star schema warehouse architectures. Depending on the object’s attributes, the output may create, replace or append DB2 UDB file
Data pre processing.
Data processing is necessary because
Real world data is dirty
Data is in consistent.
Data preprocessing tasks:
• Data cleaning
• Data integration and transformation
• Data reduction
• Discretization and concept hierarchy generation
• Concept hierarchy generation for numeric data
Data cleaning:
Data cleaning, also called data cleansing or scrubbing deals with detecting and removing errors and inconsistencies from data in order to improve the quality of data
Data Quality Problems:
Misspellings during data entry, missing information or other invalid data
Present in single collections such as files and databases
Problems increase when multiple sources need to be integrated
Data cleaning is important because,
• Poor data quality can result in loss of money
$2 billion of U.S. federal loan money has been lost due to poor data quality at a single agency.
• Data warehouses must provide high-level quality of data and service as decision support information systems.
• Probability of dirty data high.
Significant portion of cleaning and transformation work done manually or by low level programs that are difficult to maintain.
Data Cleaning Approach:
• Detect and remove all major errors and inconsistencies in individual and multiple data sources
• Inspection Supported by tools to limit manual and programming effort
• Should not be performed in isolation but with schema-related data transformations based on comprehensive metadata
• Mapping functions for data cleaning and other data transformations should be reusable for other data sources
Tools:
ETL (Extraction, Transformation, Loading) Tool:
• Typically support the ETL process for data warehouses in a comprehensive way
• Typically have little built-in data cleaning capabilities but allow the user to specify cleaning functionality via a proprietary API
• There is usually no data analysis support to automatically detect data errors and inconsistencies.
• Provide transformation libraries that cover many data transformation and cleaning needs
• Such as data type conversions (e.g., date reformatting)
• String functions (e.g., split, merge, replace, sub-string search)
• Arithmetic, scientific and statistical functions
• Typically covers if-then and case constructs that help handling exceptions in data values, such as misspellings, abbreviations, missing or cryptic values, and values outside of range
• These problems are also addressed by using a table lookup construct and join functionality
Examples of these tools include:
• Copy manager (Information Builders)
• Data stage (Informix/Ardent)
• Extract (ETI), Power mart (Informatics)
• Decision base (CA/Platinum)
• Data transformation service (Microsoft)
• Metasuite 11 (Minerva/Carleton)
• Sagent solution platform (Sagent)
• Warehouse administrator (SAS).
Data integration and transformation:
This includes:
Schema integration
Redundancy
Schema integration:
Multiple data sources may provide data on the same entity types.
For example, Meta data from two music
Servers.
Approach to data integration:
• Use mapping rules to handle structural differences
• Ambient contains a mapping rule
Manager capable of:
– Finding appropriate sets of mapping rules
– Rewriting queries based on a set of mapping
– Handling different versions of mapping rules Rules
Redundancy:
Redundancy, in general terms, refers to the quality or state of being redundant, that is: exceeding what is necessary or normal; or duplication. This can have a negative connotation, especially in
rhetoric: superfluous or repetitive; or a positive implication, especially in engineering: serving as a duplicate for preventing failure of an entire system.
Data reduction:
• Data warehouse may store terabytes of data.
Complex data analysis/mining may take a very long time to run on the complete data set
Obtain a reduced representation of the data set that is much smaller in volume but yet produce the same (or almost the same) analytical results.
Data cube aggregation:
• The lowest level of a data cube
o The aggregated data for an individual entity of interest
o e.g., a customer in a phone calling data warehouse
• Multiple levels of aggregation in data space
o Further reduce the size of data
• Reference appropriate levels
Use the smallest representation which is enough to solve the task
• Queries regarding aggregated information should be answered using data when possible
Purpose:
The main purpose of a data warehouse is to support decision-making.
• Data is collected from a number of different sources.
• It is made is to perform advanced analysis
The main goal of data warehousing is to generate front-end analytics that will support business executives and operational managers.
References:
• The Data Warehouse Toolkit by Ralph Kimball
• Building the Data Warehouse by William Inmon

WIMAX

WIMAX


IEEE 802.16 is working group number 16 of IEEE 802, specializing in point-to-multipoint broadband wireless access. It also is known as WiMAX, an acronym that stands for Worldwide Interoperability for Microwave Access.
WiMax can be used for wireless networking like the popular Wi-Fi. WiMax, a third generation protocol, allows higher data rates over longer distances, efficient use of bandwidth, and avoids interference almost to a minimum.
There are at least four 802.16 standards: 802.16, 802.16a, 802.16-2004 (802.16), and 802.16e.
Based on services provided by WiMAX there are three types usage Models. They are Fixed WiMAX, portable WiMAX, and Mobile WiMAX.Services Provided by WiMAX are Line-Of-Sight (LOS), Nonline-Of-Sight (NLOS).
. An important aspect of the IEEE 802.16 is that it defines a MAC layer that supports multiple physical layer (PHY) specifications. It will not entirely replace incumbent broadband technologies; it will serve as a viable complement to them.


Introduction
WiMax (Worldwide Interoperability for Microwave Access) is a wireless broadband technology, which supports point to multi-point (PMP) broadband wireless access.
WiMAX is an Institute of Electrical and Electronics Engineers standard designated 802.16-2004 (fixed wireless applications) and 802.16e-2005 (mobile wire-less).
WiMAX has the potential to replace a number of existing telecommunications infrastructures.
This extension provides for non-line of sight access in low frequency bands like 2 - 11 GHz. These bands are sometimes unlicensed. This also boosts the maximum distance from 31 to 50 miles and supports PMP (point to multipoint) and mesh technologies.
WiMax can be used for wireless networking like the popular Wi-Fi. WiMax, a third generation protocol, allows higher data rates over longer distances, efficient use of bandwidth, and avoids interference almost to a minimum
Usage Models of WiMAX
Based on the two services from a base station to a subscriber station, also known as customer premise equipment (CPE) Provided by WiMAX.
• Fixed WIMAX
• Portable WiMAX
• Mobile WiMAX
Fixed WiMAX:
Some goals for WiMAX include a radius of service coverage of 6 miles from a WiMAX base station for point-to-multipoint, non-line-of-sight service. This service should deliver approximately 40 megabits per second (Mbps) for fixed and portable access applications.
Mobile WiMAX:
Mobile WiMAX or portable WiMAX takes the fixed wireless application a step further and enables cell phone-like applications on a much larger scale. For example,
mobile WiMAX enables streaming video to be broadcast from a speeding police or other emergency vehicle at over 70 MPH.
Services provided by WiMAX
• Line-of-sight service
o Line-of-sight between transmitter & receiver
o 11 GHz to 66 GHz frequency range
o At Higher frequencies, there is less interference and lots more bandwidth
• Non-line-of-sight
o Line-of-sight is not required in between a small antenna on CPE and receiver.
o 2 GHz to 11 GHz frequency range.
o Longer-wavelength transmissions are not as easily disrupted by physical obstructions – they are better able to diffract, or bend, around obstacles.
WiMAX Standards
There are Four 802.16 standards:
• 802.16
• 802.16a/REVd/2004
• 802.16e

WiMAX Forum
The WiMAX Forum is a non-proft trade organization formed to promote the IEEE 802.16a wireless MAN standard, and to certify 802.16a equipment as interoperable. Like "Wi-Fi", the WiMAX Forum seeks tomake "WiMAX" a popular brand name for guaranteed interoperable wireless networking.But, while Wi-Fi focuses on wireless LAN, WiMAX will focus on using the 802.16a standard to provide wireless broadband access to businesses and consumers


The Problems WiMAX Solves
• Cellular Backhaul Problem : transport voice and data traffic from outlying cell sites to the operator's core network
• How to keep wireless notebooks and other mobile devices connected between 802.11 hotspots.
• Hotspot Backhaul Problem: Linking Wi-Fi hot spots to the Internet.
• Last Mile Problem: DSL can only reach about 18,000 feet (3 miles) from the central office switch—many urban, suburban and rural locations may not be served.
Advantages
• A single WiMAX main station can serve hundreds of users.
• Endpoints install within days instead of the weeks required for wired connections.
• Data rates as high as 280Mbps and distances of 30 miles are possible.
• Users can operate mobile within 3-5 miles of a base station at data rates up to 75Mbps.
• No FCC radio licensing is required.
Disadvantages
• Line-of-sight (LOS) is required for long distance (5-30 mile) connections
• Heavy rains can disrupt the service.
• Other wireless electronics in the vicinity can interfere with the WiMAX connection and
cause a reduction in data throughput or even a total disconnect.

MOBILE COMPUTING

MOBILE COMPUTING 

Distinction between "wireless" and "mobile."
Wireless refers to the method of transferring information between computing devices, such as a personal data assistant (PDA), and a data source, such as an agency database server, without a physical connection. Not all wireless communications technologies are mobile. For example, lasers are used in wireless data transfer between buildings, but cannot be used in mobile communications at this time.
Mobile simply describes a computing device that is not restricted to a desktop. A mobile device may be a PDA, a "smart" cell phone or Web phone, a laptop computer, or any one of numerous other devices that allow the user to complete computing tasks without being tethered, or connected, to a network. Mobile computing does not necessarily require wireless communication. In fact, it may not require communication between devices at all.
.. Mobile devices
Here we have seven different types of mobile devices:
Laptop computers
PDA’s and handheld PCs
Pagers
Smart phones and cellular phones
Task devices, such as bar code scanners
Blue tooth
Bridge
.. Challenges in mobile computing
Wireless and mobile environments bring different challenges to users and service providers when compared to fixed, wired networks. Physical constraints become much more important, such as device weight, battery power, screen size, portability, quality of radio transmission, error rates. The major challenges in mobile computing are described including: low bandwidth, high error rate, power restrictions, security, limited capabilities, disconnection and problems due to client mobility.
.. Low Bandwidth
Wireless networks deliver lower bandwidth than wired networks. As a result, mobile applications have to be carefully designed to control the bandwidth consumption. Software techniques required to improve effective bandwidth usage include data compression logging requests to combine multiple short ones, lazy write back, difference-based updates, caching, prefetching, usage of proxy, priority scheduling, etc.
.. High Error Rate
The network quality varies as the mobile computer moves across the heterogeneous network connections. The wireless environment exhibits higher error rates, which results in retransmission and affects the Quality of Service. By minimizing the usage of wireless transmission, the data is less exposed to transmission errors. In addition, error correction schemes can be employed to improve performance. However, these schemes also add to the communication overhead and reduce the usable bandwidth.
.. Power Limitations
Mobile computers are concerned with the limited power supply, an issue that does not appear in distributed wired environment. Hardware improvements on batteries can help to lengthen the life of a charge and reduce battery weight. In addition, efficient software operations can help to lower the power consumption. Examples include: shifting the processing to a fixed host, aggressively caching and prefetching data to reduce disk traffic, and transmitting less data while receiving more
.. Security
Security and privacy are of specific concerns in wireless communication because of the ease of connecting to the wireless link anonymously. Common problems are impersonation, denial of service and tapping. The main technique used is encryption. In personal profiles of users are used to restrict access to the mobile units.
.. Merits