The Google File System’s conscious design tradeoffs

Google File System Architecture

ProfProfile4-cartoon.jpgThis is my first post on the Google File System where I will very briefly touch base on a very specific feature-set that is driven by conscious design tradeoffs that have made GFS and derived systems so successful.

  1.  Highly Redundant Data vs. Highly Available HardwareWhen working with Petabytes of data hardware failure is a norm more than an exception, expensive highly redundant hardware is replaced with commodity components that allow the file system to store multiple copies of data across storage nodes and switches at a reasonable cost.
  2.  Store a small number of large files vs. millions of small individual documentsWith the need to store hundreds of terabytes composed of billions of small objects (i.e. e-Mail Messages, Webpages), GFS attempts to simplify file system design by serializing these small individual objects to be grouped together into larger files. Having a small number of large files allows GFS to keep all file and namespace metadata in memory on the GFS master which in turn allows the master to leverage this global visibility to make smarter load balancing and redundancy decisions.
  3.  Generally Immutable dataOnce a serialized object or file record is written to disk it will never be updated again, as Google states on their research paper random writes are practically non-existent. This is driven by application requirements where data is generally written once and then consumed by applications over time without alteration. Google describes the application data as mutating by either inserting new records or appending on the last “chunk” or block of a file, applications are encouraged to constrain their update strategies to these two operations.

On my next series of post I will analyze other architecture and performance characteristics that make the Google File System brilliantly innovative, stay tuned!

 

Reference:

“The Google File System”; Ghemawat, Gobioff, Leung; Google Research

Where to download older versions of Java?

I have found myself asking where can I download old versions of Java several times lately. They are generally found on Oracle’s website on a version archive page. To help with direct acess to versions here’s a list with a few versions:

 

Version 64-bit JDK 64-bit JRE 32-bit JDK 32-bit JRE
8u25 (1.8) JDK JRE JDK JRE
7u72 (1.7) JDK JRE JDK JRE
6u45 (1.6) JDK JRE JDK JRE
5.0u22 (1.5) JDK JRE JDK JRE

Issue/Error with ODI Studio right click

As I work with the Oracle Business Intelligence Applications (OBIA) repository in ODI studio I have recently noticed I am no longer able to right click on objects. I have found two solutions, the first one is a work-around:

 

Work Around:

Let’s assume you want to right click on a particular folder or scenario, you notice as you do so the context menu does not come up, go ahead and do the following:

  1. Select the object with a left click
  2. Move your mouse pointer outside the object’s boundary, I prefer a little bit to the right
  3. Right click, the context menu should come up now

This work around works if you are restricted on changing your installation’s settings or using a hosted platform such as Citrix

 

Solution:

In cases where you have access to install software your system then you should look into the compatibility matrix for ODI Studio and the version of Java you are working with. In my case I noticed the hosting provider for my environment has setup JDK 1.7  64-bit, I noticed for some versions of ODI JDK 1.6 was required so I downloaded both 32 and 64 bit versions and pointed my odi.conf file to them version. The 64 bit version did solve my issue, which is great since I can allocate more memory to the client under this bit version.

 

Related:

ODI Tip: How to make sure a “Select distinct” is issued and an ODI interface returns a unique dataset with no duplicates

PROBLEM

 

As a developer I do have a need to make sure that the subset of columns I am mapping through from source to target on my ODI interface is unique, in other words, I want ODI to include a DISTINCT clause on the SELECT statement that will be issued on the source database.

 

SOLUTION

  • Open my interface on the ODI Interface designer
  • Click on the Flow tab on the bottom
  • Click on the Target object
  • On the Property Inspector, click on the “Distinct Rows” checkbox

    image

ETL Tuning in ODI / BI Apps–The #ETL_ANALYZE_WORK_TABLE parameter

One of the first things I do when I run into performance issues with ETL loads is to look at the source and target table statistics. Have they been collected before the current select / insert statement was issued?

It turns out that in Oracle BI Apps the #ETL_ANALYZE_WORK_TABLE parameter is turned off by default when a load plan is generated. This can make doing a high level review of your load plan execution tricky since there will be steps that will seem to be gathering statistics, when in reality, the ODI code generator just puts a placeholder instead of the code for statistics. An example of this is shown below:

 

image

 

image

 

SOLUTION:

Once I realized what the issue was with statistics not being gathered for my work tables I was able to zoom into the ETL_ANALYZE_WORK_TABLE variable by looking at my generated load plan as depicted below, and change its default value to Y. The variable is defined globally so once you change the definition this new default value will apply to any newly generated load plans.

 

image  

 

image

ODI: Purging OLD Sessions

One common administrative task that I find myself doing when I realize that my ODI logs are growing fairly large is purging old sessions from the log. The steps are fairly straightforward as follows:

 

  1. Login to your ODI Studio client
  2. To to the Operator View
  3. On the top right corner of your navigation pane, expand the menu and select purge log…

    image

  4. On the Purge Log screen you can select which old sessions to remove by date, agent, context, status, user and session name

    image

  5. Once you have set parameters as desired click on OK and the ODI session logs will be purged accordingly

 

Related:

How To: Manage your Oracle patch deployment life cycle using Oracle Support Patch Plans

Introduction

 

As part of my writing I often try to document and share best practices I develop on my day to day work, this one relates to formalizing the patch deployment process for your oracle environments. This approach is developed for organizations that have formal release cycles and have established procedures to take patches through test life cycles that; at a minimum, begin in a develop environment, followed by integration testing in a QA and culminate when patches are promoted to production.

I will try to keep this post brief so, at a high level, I have found that the best way to manage patches is to use the Oracle support portal patch & upgrades functionality to create a patch plan for each environment in the life cycle for either each major release or at least each quarter. This process is always initiated by the need to apply a patch so whenever no patches are necessary during a release or quarter no patch plans are created.

The two main benefits of this approach is (1) that it brings transparency into which patches have been approved for each environment, (2) it is a straight forward process that does not carry a lot of overhead. The way patches make it to a patch plan is when a project manager requests a patch to be applied or promoted to each environment in your life cycle, this in turn is monitored using standard project management mechanisms such as issue, task and test management.

 

Implementation

Creating your first patch plan is very simple, just take your first requested patch through the process outlined below.

 

  1. Login to http://support.oracle.com
  2. Click on the Patches & Updates tab
  3. Locate the appropriate version of your patch by specifying a patch number and operating system on the patch search interface

    Locate the appropriate version of your patch by specifying a patch number and operating system on the patch search interface

  4. Locate your patch on the search results screen and click on Add to Plan > Add to new …

    Locate your patch on the search results screen and click on Add to Plan > Add to new ...

  5. Locate the a valid target application server or host name using the search box
  6. Provide a patch plan name using your company’s naming standard and click create plan

    An example naming convention I have used in the past, this particular one allows system administrators to sort by date and to manage patch plans by product:

    – – – approved patches

    Provide a patch plan name using your company's naming standard and click create plan

  7. To add any additional requested patches to your plan go back to Patches & Updatesand select your plan from the Plans list and click on the Add Patch… button.

Having this patching plan makes it easy to manage patch deployment through your environments. As for the actual deployment of each patch, I am a command line geek and like the ability to make sure that each individual patch deployment works correctly by running OPatch for each individual package.

If you find this post useful please or Share our site!

Reference:

As part of my writing I often try to document and share best practices I develop on my day to day work, this one relates to formalizing the patch deployment process for your oracle environments …

Did you run out of time in Oracle Business Intelligence Applications (OBIA)?

There’s nothing worse than this right? I’m with you man (sister)!

All right in all due seriousness this can be an awkward situation where you come in Monday morning and your business stakeholders look deeply angry since none of their reports look right and they need to finish the close of the month/year/quarter.

Anyways, in both OBIA 7.9.x and 11g this is most likely due to the fact that the variables that control the generation of calendar tables are set to a date now in the past.

 

OBIA 11.1.1.7 Solution

  1. Log in to OBIA Configuration Manager (BIACM)
  2. Go to Manage Data Load Parameters
  3. Look for Configure Time Dimension > Gregorian Date Range End
  4. Change the END_DATE parameter to a date far far in the future
  5. Put on a contrite face and let your users know this will be fixed next time your load runs

OBIA 7.9.x Solution

The same general steps would need to be applied on your DAC client to fix this issue in versions of OBIA that use Informatica Power Center

  1. Open your DAC console client
  2. Navitate to Design > Tasks
  3. Look up the SIL_DayDimension task
  4. Look up the Parameters tab on the bottom panel
  5. Change the $$END_DATE parameter to a future date
  6. Same deal, contrite face, deep breath and break in the news that this won’t be fixed until tomorrow morning

 

Related:

Hadoop Ecosystem: SQOOP – The Data Mover

Apache Sqoop Logo

Sqoop Logo

 SQOOP is an open source project hosted by the Apache Foundation whose objective is to provide a tool that will allow users to move large volumes of data in bulk from structured data sources into the Hadoop Distributed File System (HDFS). The project graduated from the Apache Incubator in March of 2012 and it is now a Top-Level Apache project.

The best way to look at Sqoop is as a collection of related tools where each of these sub-modules serves a specific use case such as importing into Hive or leveraging parallelism when reading from a MySQL database. You do specify the tool you are invoking when you use Sqoop. In terms of syntax, each of these tools have a specific set of arguments while supporting global arguments as well.

 

Below is a list of the most frequently used Sqoop tools as of version 1.4.5 with a brief description of their purpose:

 

  • Sqoop import: Helps users import a single table into Hadoop
  • Sqoop import-all-tables: Imports all tables in a database schema into Hadoop
  • Sqoop export: Allows users to export a set of files from HDFS back into a relational database
  • Sqoop create-hive-table: Allows users to import relational data directly into Apache Hive

Error when importing work repository in ODI Studio (java.lang.OutOfMemoryError: Java heap space)

INTRO

 

I am having an issue with the work repository in one of my environments this week to the point where I had to rebuild it. After dropping and recreating the schema I am running on a java heap space error.

clip_image002

 

SOLUTION

 

In my case the issue went away with the following steps:

  1. Unpack the repository content ZIP file I was importing into an uncompressed folder
  2. Up the MaxPermSize parameter on my ODI\client\odi\bin\odi.conf filefrom 512M to 1024M
    image

 

FULL ERROR MESSAGE

 

clip_image002

java.lang.OutOfMemoryError: Java heap space

                at java.lang.Class.getDeclaredMethods0(Native Method)

                at java.lang.Class.privateGetDeclaredMethods(Class.java:2427)

                at java.lang.Class.getDeclaredMethod(Class.java:1935)

                at com.sunopsis.tools.core.SnpsTools.getMethodFromHierarchy(SnpsTools.java:370)

                at com.sunopsis.tools.core.SnpsTools.getMethodFromHierarchy(SnpsTools.java:392)

                at com.sunopsis.tools.xml.SnpsXmlObjectParser.processValue(SnpsXmlObjectParser.java:611)

                at com.sunopsis.tools.xml.SnpsXmlObjectParser.endElement(SnpsXmlObjectParser.java:270)

                at oracle.xml.parser.v2.NonValidatingParser.parseElement(NonValidatingParser.java:1588)

                at oracle.xml.parser.v2.NonValidatingParser.parseRootElement(NonValidatingParser.java:442)

                at oracle.xml.parser.v2.NonValidatingParser.parseDocument(NonValidatingParser.java:388)

                at oracle.xml.parser.v2.XMLParser.parse(XMLParser.java:232)

                at com.sunopsis.tools.xml.SnpsXmlObjectParser.parseXmlFile(SnpsXmlObjectParser.java:390)

                at com.sunopsis.tools.xml.SnpsXmlObjectParser.parseXmlFile(SnpsXmlObjectParser.java:337)

                at com.sunopsis.tools.xml.SnpsXmlObjectParser.parseXmlFile(SnpsXmlObjectParser.java:347)

                at com.sunopsis.dwg.DwgObject.doImport(DwgObject.java:6747)

                at com.sunopsis.dwg.DwgObject.doImport(DwgObject.java:6620)

                at com.sunopsis.dwg.DwgObject.doImport(DwgObject.java:6578)

                at com.sunopsis.repository.manager.RepositoryManager.importObjectsUsingDoImport(RepositoryManager.java:5918)

                at com.sunopsis.repository.manager.RepositoryManager.treatObjectListGeneral(RepositoryManager.java:3985)

                at com.sunopsis.repository.manager.RepositoryManager.workRepositoryImport(RepositoryManager.java:4506)

                at com.sunopsis.repository.manager.RepositoryManager.access$7(RepositoryManager.java:4395)

                at com.sunopsis.repository.manager.RepositoryManager$2.doAction(RepositoryManager.java:4369)

                at oracle.odi.core.persistence.dwgobject.DwgObjectTemplate.execute(DwgObjectTemplate.java:216)

                at oracle.odi.core.persistence.dwgobject.TransactionalDwgObjectTemplate.execute(TransactionalDwgObjectTemplate.java:64)

                at com.sunopsis.repository.manager.RepositoryManager.internalWorkRepositoryImportWithCommit(RepositoryManager.java:4357)

                at com.sunopsis.repository.manager.RepositoryManager.workRepositoryImport(RepositoryManager.java:4661)

                at com.sunopsis.repository.manager.RepositoryManager.workRepositoryImportFromZipFile(RepositoryManager.java:4814)

                at com.sunopsis.repository.manager.RepositoryManager.workRepositoryImportFromZipFileWithCommit(RepositoryManager.java:4884)

                at com.sunopsis.repository.manager.RepositoryManager.workRepositoryImportFromZipFileWithCommit(RepositoryManager.java:4939)

                at com.sunopsis.graphical.dialog.SnpsDialogImportWork$1.run(SnpsDialogImportWork.java:155)

                at oracle.ide.dialogs.ProgressBar.run(ProgressBar.java:655)

                at java.lang.Thread.run(Thread.java:662)

 

RELATED

 

Other related issues I found when researching the solution are: