Note that his solution would work outside of Python. For instance, you may use ModelBuilder to construct this logical model.
I suspect, though, if he were to covert the shapefile to a file geodatabase, the results would be different. When you have 'big', shapefiles tend to not work well.
I've just received a great explanation of a possible cause for this and a solution that could shed some light on this. It's from Charles Convis of ESRI. I've highlighted some statements that I think are valuable. Thank you, Charles.
I suspect, though, if he were to covert the shapefile to a file geodatabase, the results would be different. When you have 'big', shapefiles tend to not work well.
I've just received a great explanation of a possible cause for this and a solution that could shed some light on this. It's from Charles Convis of ESRI. I've highlighted some statements that I think are valuable. Thank you, Charles.
Hi, I'm working with datasets in the many millions of
features with lots of vector processes including dissolves. A possible source of your problem is
"godzilla polygons", ie single polygons with a large number of
vertices. I would suspect this is very
likely with the norwegian coastline. Godzillas will often hang and crash
without informative errors. Godzillas are also common when working with data
from different scales, and data that was originally hand-digitized by someone
who didn't know the difference between streaming and point modes. i.e. they are
more common than you think.
Here is a systematic way to deal with them:
1. Add a vertexcount field to your attribute table and
calc it to !shape!.pointcount, as in:
arcpy.AddField_management(gpoly, "VERTEXCOUNT",
"LONG") arcpy.CalculateField_management(gpoly,
"VERTEXCOUNT", "!shape.pointcount!", "PYTHON",
"")
2. open up your attribute table and sort descending on
VERTEXCOUNT to get a quick summary look at your possible godzilla
polygons. Depending upon your hardware,
anything over 10,000 vertices can cause problems. Geodatabases on a higher end machine can handle
50,000 for most processes.
3. You get rid of vertices with the dice command, using
the limit you determine from the exercise above and some old fashioned trial
and error on your machine, as in:
arcpy.Dice_management(gpoly, gpolydice, 50000)
Dice is analagous
to the script you wrote, but rather than lowering feature counts by splitting
files
it cuts large polygons up so they'll behave. (If your script split up your files along
abritrary boundaries
you would have been achieving the same effect of cutting
up large polygons at the same time as you were lowering your feature counts in
each file.)
4. Now your polygons should be much more amenable to all
of the rest of your processes. Also you are more likely to be able to
successfully run any of the other more standard polygon simplify commands that
thin or generalize your linework so as to have fewer vertices.
5. In the end, a simple dissolve will get rid of your
dice lines, but it's worth re-calculating your vertexcount just to make sure
you didn't inadvertently create godzillas with your dissolve operations. Godzillas are a common side-effect of
dissolves.
general tips for handling problems and crashes:
6. If possible move to a file geodatabase, stability and
capability is orders of magnitude greater
than
shapefiles. 7,000 polygons may stress a
shapefile, but it won't make a geodatabase
even break a
sweat. I run geodatabases with 5 million features on an average PC often.
7. If possible, fire up task manager and watch your
processes while they are runing. %cpu use is less informative than physical
memory useage. A normal process will run along at, say, 50% ram useage with
plenty of fluctuations up and down, sometimes strong fluctuations. That's
normal.
The behavior of a runaway process is often to ramp up
linearly and steadily with no fluctuations.
If it hits 100% and stays there you likely have a
crash. Try watching it sometime when
it's
running a job you are having problems with and you may
find other early warning signs.
8. As I've said several times before, problems with
ArcGIS processing can more often be traced to these kinds of data issues than
to faults in the software. Sure there
are bugs, but in my experience problems in the datasets themselves are a lot
more common. Also, as a general
observation,
software issues seem to me to manifest as soon as I enter the
command. Data
issues I uncover tend to show up later on during
processing.
regards,
Charles Convis
Esri Conservation Program
Thanks for the article, everyone doing Geoprocessing should keep these informations in mind.
ReplyDelete