in debug mode, got Assertion `cudaGetLastError() == cudaSuccess' failed

Environment: Ubuntu 14.04, CUDA 8, CuDNN 5.1, Tensorflow r0.12
Build command
`bazel build -c opt --config cuda -c dbg --strip=never  //tensorflow/tools/pip_package:build_pip_package`

previously, following the installation of tensorflow (not in debug mode), my code works well. But, after I rebuild with above command, it shows this error in the middle of running the session.

> I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcupti.so.8.0 locally
> python: external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h:262: static void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable>::run(const Expression&, const Eigen::GpuDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<const float, const float>, const Eigen::TensorMap<Eigen::Tensor<const float, 1, 1, long int>, 16, Eigen::MakePointer>, const Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<const float>, const Eigen::TensorMap<Eigen::Tensor<const float, 1, 1, long int>, 16, Eigen::MakePointer> > > >; bool Vectorizable = true]: Assertion `cudaGetLastError() == cudaSuccess' failed.

the backtrace result from gdb when the error comes (seems from ReLU operator):
> (gdb) bt
> #0  0x00007ff450df6c37 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
> #1  0x00007ff450dfa028 in __GI_abort () at abort.c:89
> #2  0x00007ff450defbf6 in __assert_fail_base (
>     fmt=0x7ff450f403b8 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", 
>     assertion=assertion@entry=0x7ff42dc57260 "cudaGetLastError() == cudaSuccess", 
>     file=file@entry=0x7ff42dc57210 "external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h", line=line@entry=262, 
>     function=function@entry=0x7ff42dc58f80 <Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice, true>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const&, Eigen::GpuDevice const&)::__PRETTY_FUNCTION__> "static void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable>::run(const Expression&, const Eigen::GpuDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap"...) at assert.c:92
> #3  0x00007ff450defca2 in __GI___assert_fail (
>     assertion=0x7ff42dc57260 "cudaGetLastError() == cudaSuccess", 
>     file=0x7ff42dc57210 "external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h", line=262, 
>     function=0x7ff42dc58f80 <Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice, true>::run(Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const&, Eigen::GpuDevice const&)::__PRETTY_FUNCTION__> "static void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, Vectorizable>::run(const Expression&, const Eigen::GpuDevice&) [with Expression = const Eigen::TensorAssignOp<Eigen::TensorMap"...) at assert.c:101
> #4  0x00007ff42ad39672 in Eigen::internal::TensorExecutor<Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> const> const, Eigen::GpuDevice, true>::run (expr=..., device=...)
>     at external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h:260
> #5  0x00007ff42ad37a6b in Eigen::TensorDevice<Eigen::TensorMap<Eigen::Tensor<float, 1, 1, long>, 16, Eigen::MakePointer>, Eigen::GpuDevice>::operator=<Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_max_op<float const, float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const, Eigen::TensorCwiseNullaryOp<Eigen::internal::scalar_constant_op<float const>, Eigen::TensorMap<Eigen::Tensor<float const, 1, 1, long>, 16, Eigen::MakePointer> const> const> > (
>     this=0x7ff35b7fcfe0, other=...)
>     at external/eigen_archive/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h:35
> #6  0x00007ff42ad36cb0 in tensorflow::functor::Relu<Eigen::GpuDevice, float>::operator() (
>     this=0x7ff35b7fd04f, d=..., features=..., activations=...)
>     at ./tensorflow/core/kernels/relu_op_functor.h:35
> #7  0x00007ff42acf48c1 in tensorflow::ReluOp<Eigen::GpuDevice, float>::Operate (this=0x59fde00, 
>     context=0x7ff35b7fd8f0, input=..., output=0x7ff23ce68390)
>     at ./tensorflow/core/kernels/relu_op.h:40
> #8  0x00007ff42acee479 in tensorflow::UnaryElementWiseOp<float, tensorflow::ReluOp<Eigen::GpuDevice, float> >::Compute (this=0x59fde00, context=0x7ff35b7fd8f0)
>     at ./tensorflow/core/framework/numeric_op.h:62
> #9  0x00007ff42c18523c in tensorflow::BaseGPUDevice::Compute (this=0x5964560, op_kernel=0x59fde00, 
>     context=0x7ff35b7fd8f0) at tensorflow/core/common_runtime/gpu/gpu_device.cc:385
> #10 0x00007ff42c402c39 in tensorflow::(anonymous namespace)::ExecutorState::Process (
>     this=0x3d553c80, tagged_node=..., scheduled_usec=1481634999542636)
>     at tensorflow/core/common_runtime/executor.cc:1364
> #11 0x00007ff42c404881 in tensorflow::(anonymous namespace)::ExecutorState::__lambda4::operator() (
>     __closure=0x3d557a70) at tensorflow/core/common_runtime/executor.cc:1746
> #12 0x00007ff42c40ada4 in std::_Function_handler<void(), tensorflow::(anonymous namespace)::ExecutorState::ScheduleReady(const TaggedNodeSeq&, tensorflow::(anonymous namespace)::ExecutorState::TaggedNodeReadyQueue*)::__lambda4>::_M_invoke(const std::_Any_data &) (__functor=...)
>     at /usr/include/c++/4.8/functional:2071
> #13 0x00007ff428bcc7a8 in std::function<void ()>::operator()() const (this=0x3d557ab0)
>     at /usr/include/c++/4.8/functional:2471
> #14 0x00007ff42c6f0b80 in tensorflow::thread::EigenEnvironment::ExecuteTask (this=0x59921a8, t=...)
>     at tensorflow/core/lib/core/threadpool.cc:82
> #15 0x00007ff42c6f29fb in Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::WorkerLoop (this=0x59921a0, thread_id=7)
>     at external/eigen_archive/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h:167
> #16 0x00007ff42c6f1508 in Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::NonBlockingThreadPoolTempl(int, tensorflow::thread::EigenEnvironment)::{lambda()#1}::operator()() const
>     () at external/eigen_archive/unsupported/Eigen/CXX11/src/ThreadPool/NonBlockingThreadPool.h:58
> #17 0x00007ff42c6f3927 in std::_Function_handler<void (), Eigen::NonBlockingThreadPoolTempl<tensorflow::thread::EigenEnvironment>::NonBlockingThreadPoolTempl(int, tensorflow::thread::EigenEnvironment)::{lambda()#1}>::_M_invoke(std::_Any_data const&) (__functor=...) at /usr/include/c++/4.8/functional:2071
> #18 0x00007ff428bcc7a8 in std::function<void ()>::operator()() const (this=0x59c3bf0)

I haven't tried anything as I don't have any idea about this error. I tried with other architecture, it was working. The difference between this and other architecture is this architecture (the one with error) has depthwise layer and the other (the one without error) doesn't have depthwise layer. Seems not connected with the error from relu layer, so I don't have any idea. Both architectures were working in the non-debugging mode

Sunbelt Computer Software

PL/B Language Development and Support

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

in debug mode, got Assertion `cudaGetLastError() == cudaSuccess' failed #6285

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Sunbelt Computer Software

PL/B Language Development and Support

Uh oh!

in debug mode, got Assertion `cudaGetLastError() == cudaSuccess' failed #6285

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions